LLMOCR uses a local LLM to read text from images.
You can also change the instruction to have the LLM use the image in the way that you prompt.
- Local Processing: All processing is done locally on your machine.
- User-Friendly GUI: Includes a GUI. Relies on Koboldcpp, a single executable, for all AI functionality.
- GPU Acceleration: Will use Apple Metal, Nvidia CUDA, or AMD (Vulkan) hardware if available to greatly speed inference.
- Cross-Platform: Supports Windows, macOS ARM, and Linux.
- Python 3.8 or higher
- KoboldCPP
-
Clone the repository or download the ZIP file and extract it.
-
Install Python for Windows.
-
Download KoboldCPP.exe and place it in the LLMOCR folder. If it is not named KoboldCPP.exe, rename it to KoboldCPP.exe
-
Run
llm_ocr.bat
It will create a python environment and download the model weights. The download is quite large (6GB) and there is no progress bar, but it only needs to do this once. Once it is done KoboldCPP will start and one of the terminal windows will sayPlease connect to custom endpoint at http://localhost:5001
and then it is ready.
-
Clone the repository or download and extract the ZIP file.
-
Install Python 3.8 or higher if not already installed.
-
Create a new python env and install the requirements.txt.
-
Run kobold with flag --config llm-ocr.kcppt
-
Wait until the model weights finish downloading and the terminal window says
Please connect to custom endpoint at http://localhost:5001
-
Run llm-ocr-gui.py using Python.
This project is licensed under the MIT License - see the LICENSE file for details.