Preprocess

"This repository is directly related to the AlignVSR paper. We will continue to maintain and improve the code in the future."

Preprocess

We have adopted a consistent approach with the AUTO-AVSR repository for preprocessing the LRS2 and Single datasets.

Then, following the steps from AUTO-AVSR (preparation), we process the LRS2 and CNVSRC.Single datasets to generate the corresponding train.csv and test.csv.

Phase1-k-means

For the LRS2 and CNVSRC.Single datasets, we randomly sample a portion of the audio data from the training set to train a k-means model with a total of 200 clusters. For specific steps, please refer to this link. After completing this step, we will obtain the k-means model for the next phase of training.

Phase2-ASR-Training

We use the pre-trained Hubert model and the trained k-means model to quantize the audio data. For the quantized audio, we use Conformer as the Encoder and train it in an ASR paradigm with the hybrid CTC/Attention Loss. For detailed steps, please refer to this link.

Phase3-AlignVSR

After completing Phase2, we use the obtained quantized audio as the K (Key) and V (Value) in the Cross-Attention mechanism. The video features are used as Q (Query) and are input into the Cross-Attention. Additionally, we introduce the Local Align Loss to align the audio and video features at the frame level. For detailed steps, please refer to this link.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
align_vsr		align_vsr
checkpoints		checkpoints
conf		conf
datamodule		datamodule
espnet		espnet
preprocess_data		preprocess_data
alignvsr_env.yaml		alignvsr_env.yaml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preprocess

Phase1-k-means

Phase2-ASR-Training

Phase3-AlignVSR

About

Releases

Packages

Languages

Ruyii2/AlignVSR

Folders and files

Latest commit

History

Repository files navigation

Preprocess

Phase1-k-means

Phase2-ASR-Training

Phase3-AlignVSR

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages