A document retrieval system using ChromaDB and MixedBread embeddings for efficient semantic search capabilities. This project focuses on local deployment with the mixedbread-ai/mxbai-embed-large-v1 model.
- Local embedding generation using mixedbread-ai/mxbai-embed-large-v1
- Document preprocessing with markitdown
- Efficient document storage and retrieval using ChromaDB
- Metadata filtering support
- Fully offline capable
- Python 3.8+
- Local copy of mixedbread-ai/mxbai-embed-large-v1 model
- Sufficient storage for document embeddings
- Clone this repository:
git clone <repository-url>
cd document-retriever