Strange Output After Fine-Tuning Whisper Model "English" #100

kerolos · 2024-12-10T13:00:24Z

Description
After fine-tuning the Whisper model, the transcription output contains repetitive and incoherent results, such as "and and and..." and "foll foll foll...". Below are the dataset details, training logs, and inference outputs.

-Dataset Details:

{"audio": {"path": "011311ce3e.wav"}, "sentence": "Comparison is made to previous <UNK>", "language": "English", "sentences": [{"start": 0.0, "end": 2.09, "text": "Comparison is made to previous "}], "duration": 31.0}

Training Details:

Eval Loss: 0.1604
Checkpoint: 16000
{'loss': 0.1605, 'grad_norm': 0.6655998826026917, 'learning_rate': 0.0003321754056152683, 'epoch': 2.01}
{'loss': 0.1504, 'grad_norm': 0.6994230151176453, 'learning_rate': 0.0003280886019044506, 'epoch': 2.02}
{'loss': 0.1496, 'grad_norm': 0.605705738067627, 'learning_rate': 0.0003240017981936328, 'epoch': 2.03}

Freezing:

python merge_lora.py --lora_model=2024_11_26/whisper-large-v3/checkpoint-16000/ --output_dir=_whisper_Finetune/en/2024_11_26/whisper-large-v3/frezed_checkpoint_16000

Inference Details

python infer.py --audio_path=testset/en-AU/Medical/SonicHealth/male/MB3/Sound/9c294857-13d4-46ab-91b7-15debf011872.wav --model_path=_whisper_Finetune/en/2024_11_26/whisper-large-v3/frezed_checkpoint_16000/whisper-large-v3-finetune --use_gpu True --language English

Warnings:

FutureWarning: `max_new_tokens` is deprecated.
FutureWarning: The input name `inputs` is deprecated; use `input_features` instead.
Whisper did not predict an ending timestamp. Unexpected behavior may occur.

Output:
[50.94-Nones] Auto-d . . . . and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and associated foll associated foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll

Expected Behavior:

The fine-tuned model should produce coherent transcriptions based on the dataset.

Request for Assistance:

What might cause repetitive outputs during inference?
Could the warnings during inference indicate an issue with training or merging?
Any suggestions for resolving this problem?

The text was updated successfully, but these errors were encountered:

yeyupiaoling · 2024-12-24T11:59:32Z

@kerolos whisper-large-v3 is a model that often gets out of hand when fine-tuned, so we recommend using whisper-large-v2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange Output After Fine-Tuning Whisper Model "English" #100

Strange Output After Fine-Tuning Whisper Model "English" #100

kerolos commented Dec 10, 2024

yeyupiaoling commented Dec 24, 2024

Strange Output After Fine-Tuning Whisper Model "English" #100

Strange Output After Fine-Tuning Whisper Model "English" #100

Comments

kerolos commented Dec 10, 2024

-Dataset Details:

Training Details:

Freezing:

Inference Details

Expected Behavior:

Request for Assistance:

yeyupiaoling commented Dec 24, 2024