jabberjabberjabber / LLMOCR Public

Notifications You must be signed in to change notification settings
Fork 1
Star 36

Simple script that reads an image and dumps the text it reads using a vision model and KobolodCPP

36 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LICENSE		LICENSE
README.md		README.md
llm-ocr-gui.py		llm-ocr-gui.py
llm-ocr.kcppt		llm-ocr.kcppt
llm_ocr.bat		llm_ocr.bat
llmocr.png		llmocr.png
requirements.txt		requirements.txt

Repository files navigation

LLMOCR

LLMOCR uses a local LLM to read text from images.

You can also change the instruction to have the LLM use the image in the way that you prompt.

Features

Local Processing: All processing is done locally on your machine.
User-Friendly GUI: Includes a GUI. Relies on Koboldcpp, a single executable, for all AI functionality.
GPU Acceleration: Will use Apple Metal, Nvidia CUDA, or AMD (Vulkan) hardware if available to greatly speed inference.
Cross-Platform: Supports Windows, macOS ARM, and Linux.

Installation

Prerequisites

Python 3.8 or higher
KoboldCPP

Windows Installation

Clone the repository or download the ZIP file and extract it.
Install Python for Windows.
Download KoboldCPP.exe and place it in the LLMOCR folder. If it is not named KoboldCPP.exe, rename it to KoboldCPP.exe
Run llm_ocr.bat It will create a python environment and download the model weights. The download is quite large (6GB) and there is no progress bar, but it only needs to do this once. Once it is done KoboldCPP will start and one of the terminal windows will say Please connect to custom endpoint at http://localhost:5001 and then it is ready.

Mac and Linux Installation

Clone the repository or download and extract the ZIP file.
Install Python 3.8 or higher if not already installed.
Create a new python env and install the requirements.txt.
Run kobold with flag --config llm-ocr.kcppt
Wait until the model weights finish downloading and the terminal window says Please connect to custom endpoint at http://localhost:5001
Run llm-ocr-gui.py using Python.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

KoboldCPP for local AI processing
PyQt6 for the GUI framework

About

Simple script that reads an image and dumps the text it reads using a vision model and KobolodCPP

Report repository

Releases

No releases published

Packages

No packages published

Languages