[Feature] Add OCR #3168
Replies: 9 comments 14 replies
-
I really like this feature on GPhotos! |
Beta Was this translation helpful? Give feedback.
-
I agree with @chriexpe. I have tons of meme pictures which are full of text. After trying the immich search engine for days, I found it is not easy to find out the image that has specific text contents.
According to the description in CLIP, this model is not good at detecting the text in images. So I think if the search engine integrate with some OCR models, it might fill the lack of CLIP, and have a much perfect search results. There is a closed PR which could be a good implementation reference (#1200). For example, here are some images I can't find when given corresponding text:
|
Beta Was this translation helpful? Give feedback.
-
This is also a feature I use heavily. It would be supremely useful. |
Beta Was this translation helpful? Give feedback.
-
Would be great to have that |
Beta Was this translation helpful? Give feedback.
-
Indeed, the OCR feature is very useful in cloud albums and I hope it will be added in subsequent updates! |
Beta Was this translation helpful? Give feedback.
-
For me OCR is major feature and part of my document archive workflow. My scanned documents are in G drive. In G drive, not in G Fotos. But screenshots are in G Photos and are scanned. So maybe I should even better think about how to store screenshots not in G Photos but in G drive. Then OCR result in Photos would be much more clean. Anyway OCR is important. |
Beta Was this translation helpful? Give feedback.
-
At least until it is officially added here is something you can get right now with CLIP model: the CLIP model that exists in immich supports OCR in a built-in way, but not at a level of height accuracy Some models are trained on more data and some on less with different levels of accuracy. If you still want OCR (although I wouldn't recommend it for CLIP) A model like laion2b_s32b_b79k gets good performance in Average perf. on 38 datasets and also in SST2 so it can suit you. For comparison the basic model that Immich is set to work with (ViT-B-32__openai) got only 0.5865 in SST2 compared to laion2b_s32b_b79k who got 0.6392 in SST2. Important note: different models require different levels of computing power (FLOPs) you should check if this is something your system is capable of before choosing to change models. Another note is that it is necessary to check whether the model exists in the Immich database Sources regarding tests in SST2 can be found here: |
Beta Was this translation helpful? Give feedback.
This comment has been minimized.
This comment has been minimized.
-
Voyage AI just released a very promising model for multimodal (images and document screenshots) embeddings search. |
Beta Was this translation helpful? Give feedback.
-
The feature
OCR is the only thing that keeps bringing me back to Google Photos, as I have A TON of screenshots, memes and documents so it's really handy to find exactly what I want.
I've seen somewhere here someone using this repo for OCR PaddlePaddle/PaddleOCR, hopefully this might help you integrate into Immich!
Thank you guys for this amazing project!
Platform
Beta Was this translation helpful? Give feedback.
All reactions