Doc Master 📚

A powerful, lightweight Python library for automated file reading and content extraction. Doc Master simplifies the process of reading various file formats into string representations, making it perfect for data processing, content analysis, and document management systems.

🚀 Features

Universal File Reading: Seamlessly handle multiple file formats including:
- PDF documents
- Microsoft Word documents (.docx)
- Excel spreadsheets
- Text files
- XML documents
- Images (with base64 encoding)
- Binary files
Smart Format Detection: Automatic file type detection and appropriate processing
Flexible Output: Choose between string or dictionary output formats
Batch Processing: Process entire folders of documents efficiently
Encoding Detection: Smart encoding detection for text files
Enterprise-Ready: Built with stability and performance in mind

📦 Installation

pip install -U doc-master

🔧 Quick Start

from doc_master import doc_master

# Read all files in a folder
results = doc_master(folder_path="path/to/folder", output_type="dict")

# Or read a single file
content = doc_master(file_path="path/to/file.docx")

📋 Requirements

Python 3.8+
pandas
pypdf
python-docx
Pillow

🤝 Contributing

We love your input! We want to make contributing to Doc Master as easy and transparent as possible. Here's how you can help:

Fork the repo
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Check out our Contributing Guidelines for more details.

🌟 Support the Project

If you find Doc Master useful, please consider:

Starring the repository ⭐
Following us on GitHub
Joining our Discord community
Sharing the project with others

📖 Documentation

For detailed documentation, visit our Wiki.

Basic Usage Examples

# Read a PDF file
content = read_single_file("document.pdf")

# Read an Excel file with specific sheet
reader = AutoFileReader()
content = reader.read_file("spreadsheet.xlsx", sheet_name="Data")

# Process a folder of documents
results = doc_master(
    folder_path="documents/",
    output_type="dict"
)

🔍 Error Handling

The library includes comprehensive error handling:

try:
    content = read_single_file("file.pdf")
except Exception as e:
    print(f"Error processing file: {e}")

🛣️ Roadmap

Add OCR capabilities for image processing
Support for additional file formats
Performance optimizations for large files [multi-threading]
Async file processing
CLI interface

💬 Community and Support

Join our Discord server for discussions and support
Check out our GitHub Issues for bug reports and feature requests
Follow our GitHub Discussions for general questions

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

All our amazing contributors
The open-source community
The Swarm Corporation team

Made with ❤️ by The Swarm Corporation

⭐ Star us on GitHub!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
doc_master		doc_master
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bitcoin.pdf		bitcoin.pdf
example.py		example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Doc Master 📚

🚀 Features

📦 Installation

🔧 Quick Start

📋 Requirements

🤝 Contributing

🌟 Support the Project

📖 Documentation

Basic Usage Examples

🔍 Error Handling

🛣️ Roadmap

💬 Community and Support

📄 License

🙏 Acknowledgments

About

Releases

Sponsor this project

Packages

Languages

License

The-Swarm-Corporation/doc-master

Folders and files

Latest commit

History

Repository files navigation

Doc Master 📚

🚀 Features

📦 Installation

🔧 Quick Start

📋 Requirements

🤝 Contributing

🌟 Support the Project

📖 Documentation

Basic Usage Examples

🔍 Error Handling

🛣️ Roadmap

💬 Community and Support

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages