This project is a web-based application that allows you to chat with multiple PDF documents. It uses a combination of natural language processing and machine learning techniques to extract text from PDFs, create embeddings, and provide conversational responses to user questions.
- Upload and analyze multiple PDF documents.
- Extract text from PDF documents.
- Split large text into manageable chunks.
- Create embeddings for text chunks.
- Provide conversational responses to user questions.
- User-friendly interface powered by Streamlit.
Before you begin, make sure you have the following dependencies installed:
- Python (3.7 or higher)
- Streamlit
- dotenv
- PyPDF2
- langchain
- htmx
- A Hugging Face model hub account and API token for Hugging Face model retrieval.
-
Clone this repository to your local machine:
git clone https://github.com/pawanparackal/Data-Science-Project/Chat_With_Multiple_PDFs.git cd Data-Science-Project
Upload your PDF documents using the file uploader in the sidebar.
Click the "Proceed" button to start the analysis of the uploaded PDFs.
The application will extract text from the PDFs, split it into chunks, and create embeddings.
You can ask questions in the chat input field.
The application will provide conversational responses based on the content of the uploaded PDFs. Create a new branch for your feature or bug fix. Make your changes and commit them with clear messages. Push your changes to your fork. Submit a pull request to the main repository.