Skip to content

Commit

Permalink
Formatted and added objectives and details
Browse files Browse the repository at this point in the history
  • Loading branch information
sethu authored Sep 26, 2024
1 parent 5e6053f commit bbbb84b
Showing 1 changed file with 68 additions and 38 deletions.
106 changes: 68 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,97 @@
# README
Voice to text from major languages supported by whisper model, this applicationn will transcibe the uploaded or recored audio to its original language, translate to Englis and and summarise in English.
# Solution Overview:

This solution aims to automate the extraction of key information from telephonic conversations conducted in various Indian languages, ensuring data security by processing audio and text locally. The system will handle transcription, entity extraction, summarization, and translation of conversations, all without reliance on cloud services to mitigate risks of data breaches and leaks. This solution will enhance customer service efficiency, reduce manual effort, and improve the accuracy of customer data extraction, all while addressing data privacy concerns by avoiding cloud-based data processing.

**Targeted Customers:**

Non-Banking Financial Companies (NBFCs), Insurance companies, and medium/large call centers that operate in multilingual environments and handle a significant volume of customer calls.

**Supported Languages**

**Supported Languages (by whispher model)**
1. Indian Languages:
_Hindi, Kannada, Marathi and Tamil_

2. Other languages:
_Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh._



**Supported audio format**

_m4a, mp3, webm, mp4, mpga, wav, mpeg_
**Detailed Approach:**
1. Speech Recognition

Technology:
The Whisper Model will be employed for Automatic Speech Recognition (ASR).
This model is capable of accurately transcribing conversations in multiple Indian languages, such as Hindi, Tamil, Telugu, Bengali, and others.
Objective:
Convert audio recordings into text transcriptions in the original language of the conversation, ensuring high fidelity and accuracy.

2. Language Identification

Technology:
Language identification will be handled using LLaMA or a similar NLP-based model, trained to recognize various Indian languages from the transcribed text.
Objective:
Automatically detect the language of the conversation and appropriately tag the transcription for further processing, enabling a multi-language system.

3. Key Entity Extraction

**Pre-requisite:**
Note: Following instructions a for linux,
python 3.8.1 or above
Technology:
LLaMA or other suitable NLP models will be used for Named Entity Recognition (NER), which will focus on identifying and extracting essential details such as names, addresses, phone numbers, account information, and other relevant entities from the transcriptions.
Features:
The system will extract key entities in both the original language of the conversation and their corresponding English translations.
If specific entities are absent in the conversation, the system will return blank or null values for those entities, ensuring seamless integration without unnecessary errors or noise in the data.

sudo apt-get update
sudo apt-get install python3.8.1
4. Translation

ffmpeg
Technology:
NLP translation models will be employed to convert the extracted transcriptions and key entities into English, ensuring clarity and consistency for downstream processing.
Objective:
Translate the conversation's core elements to English while retaining the meaning and context, helping streamline operations across multilingual environments.

sudo apt update && sudo apt install ffmpeg

Ollam
5. Summarization

For Linux:
Technology:
NLP-based summarization models will be integrated to condense long conversations into brief, actionable summaries.
Objective:
Provide concise summaries of calls, focusing on the key points and outcomes, improving operational efficiency by reducing the need to process entire conversations manually.

curl -fsSL https://ollama.com/install.sh | sh
For Mac:
6. Integration with Existing Systems

https://ollama.com/download/Ollama-darwin.zip

For Windows:
Objective:
Integrate the solution with existing systems such as the customer’s CRM and call recording infrastructure.
Automate updates of customer records with transcribed and summarized information, improving workflow and reducing manual effort.

https://ollama.com/download/OllamaSetup.exe

Llama 3.1 model
**Proof of Concept (PoC) Plan:**

ollama run llama3.1
Demo Objectives:
Build a demonstration that showcases the system's ability to transcribe, summarize, and extract entities from audio recordings.
Support live audio input or pre-recorded audio file uploads.

**Setup**
Clone this github repository
git clone
Flow:
Input: The system accepts an audio file in an Indian language.
Process:
Language identification and transcription using Whisper.
Entity extraction and summarization using LLaMA.
Optional translation of extracted data into English.
Output: The system provides a structured output, including the transcription, identified entities, and summary.

Create python virtual environment
**Scenarios:**

python3 -m venv lingo .
Language Identification:
The system should accurately identify the language of the uploaded audio file from the range of supported Indian languages.
Summarization:
The solution must summarise the conversation in the English language/

Activate the virtual environment

source lingo/bin/activate
**Technology Stack:**

Install dependencies
AI Models: Whisper for ASR, LLaMA for NLP tasks (entity extraction, language identification, and summarization).
Backend: Python FastAPI for API creation and managing workflows.
Frontend: NextJS for a dynamic and interactive user interface.

pip install -r requirements.txt

Execute the applilcation
**[Backend Setup instructions](https://github.com/joshsoftware/lingo.ai/blob/dev/service/README.md)**

python3 app.py
**[Frontend Setup instructions](https://github.com/joshsoftware/lingo.ai/blob/dev/app/README.md)**

It will execute and prints the url, http://localhost:77434, in the output console, copy the url and paste in the brwoser

0 comments on commit bbbb84b

Please sign in to comment.