[Feature] Add OCR #3168

chriexpe · 2023-07-09T01:00:47Z

chriexpe
Jul 9, 2023

The feature

OCR is the only thing that keeps bringing me back to Google Photos, as I have A TON of screenshots, memes and documents so it's really handy to find exactly what I want.

I've seen somewhere here someone using this repo for OCR PaddlePaddle/PaddleOCR, hopefully this might help you integrate into Immich!

Thank you guys for this amazing project!

Platform

Server
Web
Mobile

klaus1k · 2023-07-09T08:12:32Z

klaus1k
Jul 9, 2023

I really like this feature on GPhotos!
It seems to me that it already works with CLIP? If I search for text in my pictures in Immich it shows the pictures with this text in them.

1 reply

chriexpe Jul 10, 2023
Author

I think that might be just a coincidence, in my case most of the time the results are random

a45s67 · 2023-08-09T16:28:53Z

a45s67
Aug 9, 2023

I agree with @chriexpe.

I have tons of meme pictures which are full of text. After trying the immich search engine for days, I found it is not easy to find out the image that has specific text contents.

For instance, although CLIP learns a capable OCR system, when evaluated on handwritten digits from the MNIST dataset, zero-shot CLIP only achieves 88% accuracy, well below the 99.75% of humans on the dataset.

According to the description in CLIP, this model is not good at detecting the text in images. So I think if the search engine integrate with some OCR models, it might fill the lack of CLIP, and have a much perfect search results. There is a closed PR which could be a good implementation reference (#1200).

For example, here are some images I can't find when given corresponding text:

	Images has corresponding text	Search result
Text: My Confidence
Text: XD

1 reply

NicholasFlamy May 7, 2024
Collaborator

Try a better model. Obviously this isn't a solution, but it will give better results.

eagle470 · 2023-09-09T21:25:22Z

eagle470
Sep 9, 2023

This is also a feature I use heavily. It would be supremely useful.

1 reply

eagle470 Sep 20, 2023

I'll add clarity, I use it on my iPhone with apple photos. I have moved all images with text to immich similar to what was mentioned above.

I would do everything but that is currently not recommended by the developers.

krebsman23 · 2023-12-04T13:54:02Z

krebsman23
Dec 4, 2023

Would be great to have that

0 replies

Bronya-Rayi · 2023-12-26T14:23:04Z

Bronya-Rayi
Dec 26, 2023

Indeed, the OCR feature is very useful in cloud albums and I hope it will be added in subsequent updates!

0 replies

aisbergde · 2023-12-27T08:20:11Z

aisbergde
Dec 27, 2023

For me OCR is major feature and part of my document archive workflow. My scanned documents are in G drive. In G drive, not in G Fotos. But screenshots are in G Photos and are scanned. So maybe I should even better think about how to store screenshots not in G Photos but in G drive.

Then OCR result in Photos would be much more clean. Anyway OCR is important.

1 reply

2001Y May 7, 2024

私もこの意見に非常に同意します。
書類はファイルストレージにあるが、スクリーンショットは画像管理アプリにあるし、お店の看板や車のナンバーも写真管理アプリにあります。

OCRは機械学習による画像認識より軽量だと思うので是非実装してほしいです。

aviv926 · 2023-12-27T17:34:24Z

aviv926
Dec 27, 2023
Collaborator

At least until it is officially added here is something you can get right now with CLIP model:

the CLIP model that exists in immich supports OCR in a built-in way, but not at a level of height accuracy

Some models are trained on more data and some on less with different levels of accuracy.
CLIP does not set itself the primary goal of recognizing text in an image, but it is a type of option that exists, but it is certainly not something that can be relied on as OCR.

If you still want OCR (although I wouldn't recommend it for CLIP)
I would look for a balance between performance and results in the SST2 database

A model like laion2b_s32b_b79k gets good performance in Average perf. on 38 datasets and also in SST2 so it can suit you.

For comparison the basic model that Immich is set to work with (ViT-B-32__openai) got only 0.5865 in SST2 compared to laion2b_s32b_b79k who got 0.6392 in SST2.
Although again, I wouldn't expect a high level of OCR from a CLIP model.

Important note: different models require different levels of computing power (FLOPs) you should check if this is something your system is capable of before choosing to change models.

Another note is that it is necessary to check whether the model exists in the Immich database
You can check it here:
https://huggingface.co/collections/immich-app/clip-654eaefb077425890874cd07

Sources regarding tests in SST2 can be found here:
https://github.com/mlfoundations/open_clip/blob/main/docs/openclip_results.csv#L84

10 replies

aisbergde May 9, 2024

A lot of resources are used to apply models and to get the results. Switching models currently means loosing all this effort.

Another idea would be to not delete the existing results but to store them "on hold" for later reusing. Nothing needs to be changed in the existing implementation, but only a save and restore part could be added. This can be done on the database level:

additional backup tables should be created
procedures to backup and restore the tagging results.

These are simple database operations.

NicholasFlamy May 9, 2024
Collaborator

You could just take regular backups. Like right now with the current immich you could take regular backups to accomplish some of these goals by restoring the backups to compare.

aisbergde May 9, 2024

For me it would be much easier to add some tables and procedures to the existing database, because databases are my daily business. But I don't know what happens, when I add my own database objects (tables, views, procedures) in case of updates. Would my additional objects sustain? Normally they should.

The question: is it worth the effort, when these requirements are not really requested? If it is not supported via the GUI, the usage requires database knowledge. Currently the one existing result set is OK for me. I am happy with the face tagging and I don't use other tagging. For me much more important will be the possibility for manually tagging.

AngelaDMerkel May 9, 2024

If you want a tool for comparing AI models, that is not what immich is for.

I'm not skilled in this field at all, so please forgive me if I've explained poorly. Why would it not be possible to use an OCR tool like Tesseract and write any visible words to a little sidecar or somewhere in the database so that certain text strings can be searched and compared against transcribed content?

Would attempting to do so interfere with the ability to properly use the facial recognition models currently in use?

NicholasFlamy May 9, 2024
Collaborator

Sorry @AngelaDMerkel, I should've @ the person I was talking to. You do realize you're asking for real OCR but @aisbergde seems to be asking for the ability to run multiple CLIP models at once so that you can switch while searching at least partly for the purpose of comparing models. I was attempting to address that comparing models can be done outside of immich and that the point of immich is a photo manager that has cool features like search using CLIP, not specifically a tool for comparing CLIP models.

ZtereoHYPE · 2024-11-17T12:24:22Z

ZtereoHYPE
Nov 17, 2024

Voyage AI just released a very promising model for multimodal (images and document screenshots) embeddings search.
Perhaps this could be good enough to not require OCR to be implemented separately from the current CLIP search?

0 replies

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add OCR #3168

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments 14 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

This comment has been minimized.

{{title}}

Select a reply

[Feature] Add OCR #3168

The feature

Platform

Replies: 9 comments · 14 replies

chriexpe Jul 10, 2023 Author

NicholasFlamy May 7, 2024 Collaborator

aviv926 Dec 27, 2023 Collaborator

NicholasFlamy May 9, 2024 Collaborator

NicholasFlamy May 9, 2024 Collaborator

This comment has been minimized.

Replies: 9 comments 14 replies

chriexpe Jul 10, 2023
Author

NicholasFlamy May 7, 2024
Collaborator

aviv926
Dec 27, 2023
Collaborator

NicholasFlamy May 9, 2024
Collaborator

NicholasFlamy May 9, 2024
Collaborator