Welcome to the Speak-To-Docs project repository, organized by the Microsoft Learn Student Ambassadors for Hacktoberfest 2024! This repository is dedicated to building and enhancing a Speech-Enabled Retrieval-Augmented Generation (RAG) Solution, dubbed "Speak-To-Docs." We're excited to have you contribute and improve this innovative project.
-
Clone the Repository:
From your terminal, clone your forked repository and name it
speak-to-docs
.# Replace {user_name} with your GitHub username git clone https://github.com/{user_name}/speak-to-docs.git
-
Set Up Virtual Environment:
Create a virtual environment named
speak-to-docs
.# Windows python -m venv speak-to-docs # macOS or Linux python3 -m venv speak-to-docs
Activate the virtual environment:
# Windows speak-to-docs\Scripts\activate # macOS or Linux source speak-to-docs/bin/activate
Install necessary dependencies:
cd speak-to-docs pip install -r requirements.txt
Add the virtual environment to Jupyter Kernel if necessary:
python -m ipykernel install --user --name=speak-to-docs
-
Work on the Project:
- This repository is specifically for the Speak-To-Docs RAG project. Explore the project structure and check the Issues tab for tasks or bugs that you can address.
- You are encouraged to review the current implementation and contribute new features or improvements to the Speech-Enabled RAG Solution.
-
Commit and Push Your Changes:
Once your contributions are ready, commit your changes and push them to your forked repository.
git add . git commit -m "{COMMIT_MESSAGE}" git push
-
Submit a Pull Request:
After pushing your changes, submit a pull request to merge them into the main repository. Make sure to include a clear and concise description of what your contribution entails.
The Speech-Enabled RAG Solution is a voice-powered interface that allows users to engage with their documents through speech. Look at it as a model that explains a document you want to read.
The project is structured as follows:
-
speech_to_docs: This is the main directory for the project.
-
speech_to_docs/src: This directory contains all the files that will house all the functionalities of the project: Speech transcription and synthesis, RAG model Solution and document reading.
-
speech_to_docs/src/rag_functions.py: This file contains functions for checking the uploaded file compatibility, making sure files do not exceed a 50-page limit. It also includes functionalities for processing various document types (PDF, PPTX, TXT) to extract content using Azure Document Intelligence. It provides detailed logging for error handling and tracks the extraction process, saving the output in a user-friendly text format.
-
speech_to_docs/src/speech_io.py: This files handles the speech_to_text/ text_to_speech function of the model by using Azure Cognitive Services: Speech Transcription (Speech-to-Text) and Speech Synthesis (Text-to-Speech).
-
speech_to_docs/.gitignore: This contains all the folder and files that are not to be pushed to GitHub (e.g. .env, bin/ e.t.c)
-
speech_to_docs/main.py: The main.py script serves as the core interface for the Speech-Enabled RAG Solution, facilitating voice interactions with documents through Azure AI Services for speech transcription and synthesis, while managing user interactions and session states.
-
speech_to_docs/requirements.txt: This file lists the dependencies required to run the project.
-
speech_to_docs/README.md: This file contains information about the project, including this guide
-
speech_to_docs/LICENSE: This file contains the license information for the project.
-
speech_to_docs/CONTRIBUTING.md: This file contains information about contributing to the project
-
speech_to_docs/CODE_OF_CONDUCT.md: This file contains information about the purpose, policy and behaviour expected of the project.
-
speech_to_docs/LEADERBOARD.md: This file contains information about the leaderboard (ranking of people with the highest PRs).
- Review the existing project code and issues to understand the functionality.
- Find an open issue that matches your skills or propose a new feature.
- Work on your contribution, test it thoroughly, and make sure it aligns with the project goals.
- Submit your pull request with a clear explanation of your contribution.
- Follow best practices for coding, including writing clean and well-documented code.
- Provide meaningful commit messages and detailed pull request descriptions.
- Respectfully collaborate and communicate with other contributors.
- Feel free to ask questions or seek guidance from project maintainers if needed.
Happy hacking! We can't wait to see your amazing contributions!
- How to Do Your First Pull Request
- Azure Document Intelligence
- Azure Document Intelligence-Code Implementation
- Use the fast transcription API (preview) with Azure AI Speech
- Quickstart: Convert text to speech
- Fundamentals of Azure OpenAI Service
- Azure OpenAI Models: Deployment
- Azure Speech Service documentation
- Develop Generative AI solutions with Azure OpenAI Service
- Langchain's DocArrayInMemoryStore Documentation