📄 OCR Reader, 🔍 Analyzer, and 💬 Chat Assistant using 🔎 Zerox, 🧠 GPT-4o, powered by 🚀 AI/ML API

What I Built

I built an OCR Document Reader that allows users to upload and extract text from various document types such as PDFs, Word, and documents. The app utilizes the Zerox library for Optical Character Recognition (OCR) and integrates the AI/ML API's GPT-4o model for advanced text analysis. With features like support for multiple document formats, text analysis, and an interactive interface built with Gradio 5.0, this app simplifies the process of extracting and analyzing text from complex documents.

Limitations

Processing Time: Enabling the maintain_format option can slow down processing due to sequential requests needed to preserve formatting.
API Constraints: The app's capabilities depend on the limitations of the AI/ML API plan, such as request quotas and document size restrictions.
System Dependencies: Requires installation of system packages like poppler-utils, which may not be straightforward on all platforms.

Demo

Here are some key features of the app:

Upload Documents:
Users can upload PDFs, Word documents, or images for OCR processing.
Extracted Text Display:
The extracted text is displayed within the app, with options to copy or download it.
Maintain Formatting:
Optionally preserve the original document's formatting in the extracted text.

Tech Stack

Python: Core programming language.
Gradio 5.0: For building the user-friendly interface.
Zerox: Library used for OCR processing.
AI/ML API: Provides the GPT-4o model for text analysis.
LiteLLM: Used under the hood for model interactions.

More Details

Zerox Library: Transforms uploaded documents into images and performs OCR to extract text.
AI/ML API's GPT-4o: Analyzes the extracted text, enabling advanced features like summarization or content analysis.
Gradio Interface: Offers an intuitive web-based UI for users to interact with the app seamlessly.

Future Improvements

Batch Processing: Enable users to upload and process multiple documents at once.
Advanced Formatting Preservation: Improve the ability to retain complex layouts, tables, and graphics.
User Accounts: Implement authentication to allow users to save and manage their processed documents.
Cloud Integration: Add options to upload documents from and save results to cloud storage services.

Running the Repository

To run this project locally, follow these steps:

# 1. Clone the repository
git clone https://github.com/jadouse5/ocr-aimlapi-zerox.git
cd ocr-document-reader

# 2. Install Python dependencies
pip install -r requirements.txt

# 3. Install system dependencies
# On Ubuntu/Linux
sudo apt-get update
sudo apt-get install -y poppler-utils

# On macOS (using Homebrew)
brew install poppler

# 4. Set up environment variables
# Create a .env file in the root directory and add:
OPENAI_API_KEY=your_api_key
OPENAI_API_BASE=https://api.aimlapi.com/v1  # Adjust if necessary

# 5. Run the application
python ocr_app.py

# 6. Open your browser and navigate to
http://localhost:7860

Note: Replace your_api_key with your actual API key for the AI/ML API.

Hashtags

#OCR #AI #Gradio #Python #GPT4o #Zerox #TextAnalysis #MachineLearning

Feel free to customize this README with your own links, images, and additional details to better suit your project. This template follows the structure of the example you provided and highlights the key aspects of your OCR Document Reader application.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
myenv		myenv
.DS_Store		.DS_Store
.env		.env
.gitignore		.gitignore
Docker		Docker
README.md		README.md
app.py		app.py
ocr-aimlapi-zerox.gif		ocr-aimlapi-zerox.gif
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 OCR Reader, 🔍 Analyzer, and 💬 Chat Assistant using 🔎 Zerox, 🧠 GPT-4o, powered by 🚀 AI/ML API

Limitations

Demo

Tech Stack

More Details

Future Improvements

Running the Repository

Hashtags

About

Releases

Packages

Languages

jadouse5/ocr-gradio-aimlapi

Folders and files

Latest commit

History

Repository files navigation

📄 OCR Reader, 🔍 Analyzer, and 💬 Chat Assistant using 🔎 Zerox, 🧠 GPT-4o, powered by 🚀 AI/ML API

Limitations

Demo

Tech Stack

More Details

Future Improvements

Running the Repository

Hashtags

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages