Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange Output After Fine-Tuning Whisper Model "English" #100

Open
kerolos opened this issue Dec 10, 2024 · 1 comment
Open

Strange Output After Fine-Tuning Whisper Model "English" #100

kerolos opened this issue Dec 10, 2024 · 1 comment

Comments

@kerolos
Copy link

kerolos commented Dec 10, 2024

Description
After fine-tuning the Whisper model, the transcription output contains repetitive and incoherent results, such as "and and and..." and "foll foll foll...". Below are the dataset details, training logs, and inference outputs.

-Dataset Details:

{"audio": {"path": "011311ce3e.wav"}, "sentence": "Comparison is made to previous <UNK>", "language": "English", "sentences": [{"start": 0.0, "end": 2.09, "text": "Comparison is made to previous "}], "duration": 31.0}

Training Details:

Eval Loss: 0.1604
Checkpoint: 16000
{'loss': 0.1605, 'grad_norm': 0.6655998826026917, 'learning_rate': 0.0003321754056152683, 'epoch': 2.01}
{'loss': 0.1504, 'grad_norm': 0.6994230151176453, 'learning_rate': 0.0003280886019044506, 'epoch': 2.02}
{'loss': 0.1496, 'grad_norm': 0.605705738067627, 'learning_rate': 0.0003240017981936328, 'epoch': 2.03}

Freezing:

python merge_lora.py --lora_model=2024_11_26/whisper-large-v3/checkpoint-16000/ --output_dir=_whisper_Finetune/en/2024_11_26/whisper-large-v3/frezed_checkpoint_16000

Inference Details

python infer.py --audio_path=testset/en-AU/Medical/SonicHealth/male/MB3/Sound/9c294857-13d4-46ab-91b7-15debf011872.wav --model_path=_whisper_Finetune/en/2024_11_26/whisper-large-v3/frezed_checkpoint_16000/whisper-large-v3-finetune --use_gpu True --language English

Warnings:

FutureWarning: `max_new_tokens` is deprecated.
FutureWarning: The input name `inputs` is deprecated; use `input_features` instead.
Whisper did not predict an ending timestamp. Unexpected behavior may occur.

Output:
[50.94-Nones] Auto-d . . . . and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and associated foll associated foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll

Expected Behavior:

The fine-tuned model should produce coherent transcriptions based on the dataset.

Request for Assistance:

  • What might cause repetitive outputs during inference?
  • Could the warnings during inference indicate an issue with training or merging?
  • Any suggestions for resolving this problem?
@yeyupiaoling
Copy link
Owner

@kerolos whisper-large-v3 is a model that often gets out of hand when fine-tuned, so we recommend using whisper-large-v2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants