Tamil-ASR

LOW-RANK ADAPTATION ON WHISPER MODEL FOR TAMIL ASR

This repository contains the code and related materials for the project LOW-RANK ADAPTATION ON WHISPER MODEL FOR TAMIL ASR.

Introduction

Advances in speech processing have led to significant improvements in technologies like virtual assistants, automated transcription services, and real-time translation tools. This work mainly focuses on extracting learnable features, enhancing accuracy, and reducing the computational complexity of Tamil ASR.

Problem Statement

With limited work and dataset on dravidian languages, my project aims to finetune the publicly available Whisper Tamil Model using Low Rank Adaptation

Dataset

The low-rank adaptation technique discussed in the previous chapter was implemented by finetuning the Whisper Small Tamil model using publicly available Common Voice Tamil dataset 13.

Example of data:

{'audio': {'path':'/root/.cache/huggingface/datasets/downloads/extracted/26d01089476eefe8f1950b403fe01fb35c17249845931bb35227afa2fe442bdd/ta_train_0/common_voice_ta_26650298.mp3', 'array': array([0., 0., 0., ..., 0., 0., 0.]), 'sampling_rate': 16000}, 'sentence': 'அவரைப் பொதுமக்கள் விடாமல் பின்னாலேயே துரத்திக் கொண்டே ஓடினார்கள்.'}

Experiment

LoRA is implemented on key and output projections, for fine-tuning Whisper Small Tamil model with 3M parameters.

The publicly available Common Voice 13 Tamil speech datasets were used for finetuning the whisper model for Tamil transcription task. Implementation is based on the publicly available Huggingface Transformers3 code base. All the experiments are conducted on free available NVIDIA T4 GPUs in the Collab platform.

Typically, α is set as 64 and rank as 32. Besides, in Algorithm 1, we prune singular values every ∆T step (e.g., ∆T = 100) such that the pruned triplets can still get updated within these intervals and possibly reactivated in future iterations. The number of trainable parameters is controlled by the rank r and the number of adapted weight matrices n. The dropout rate is fixed as 0.05 for all experiments

Training Parameters:

TrainOutput( global_step=100, training_loss=0.1413337230682373, metrics={'train_runtime': 3350.6723, 'train_samples_per_second': 0.239, 'train_steps_per_second': 0.03, 'total_flos': 2.34945183744e+17, 'train_loss': 0.1413337230682373, 'epoch': 0.018453589223103892})

Evaluation Results:

The Word Error Rate by testing the finetuned model with the Common Voice 13 Tamil Test Data is obtained as 39.44160

Installation and Usage

Prerequisites:

Python 3.x

Numpy

!pip install -q transformers datasets librosa evaluate jiwer gradio bitsandbytes==0.37 accelerate
!pip install -q git+https://github.com/huggingface/peft.git@main
!apt-get install -y nvidia-cuda-toolkit
!pip install bitsandbytes --upgrade

Installation:

Clone the repository:

git clone https://github.com/syed-azim-git/Tamil-ASR/edit/main.git

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
Tamil ASR.pdf		Tamil ASR.pdf
TamilASR_Result.ipynb		TamilASR_Result.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tamil-ASR

Table of Contents

Introduction

Problem Statement

Dataset

Example of data:

Experiment

Training Parameters:

Evaluation Results:

Installation and Usage

Prerequisites:

Installation:

About

Releases

Packages

Languages

syed-azim-git/Tamil-ASR

Folders and files

Latest commit

History

Repository files navigation

Tamil-ASR

Table of Contents

Introduction

Problem Statement

Dataset

Example of data:

Experiment

Training Parameters:

Evaluation Results:

Installation and Usage

Prerequisites:

Installation:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages