You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
After fine-tuning the Whisper model, the transcription output contains repetitive and incoherent results, such as "and and and..." and "foll foll foll...". Below are the dataset details, training logs, and inference outputs.
-Dataset Details:
{"audio": {"path": "011311ce3e.wav"}, "sentence": "Comparison is made to previous <UNK>", "language": "English", "sentences": [{"start": 0.0, "end": 2.09, "text": "Comparison is made to previous "}], "duration": 31.0}
python infer.py --audio_path=testset/en-AU/Medical/SonicHealth/male/MB3/Sound/9c294857-13d4-46ab-91b7-15debf011872.wav --model_path=_whisper_Finetune/en/2024_11_26/whisper-large-v3/frezed_checkpoint_16000/whisper-large-v3-finetune --use_gpu True --language English
Warnings:
FutureWarning: `max_new_tokens` is deprecated.
FutureWarning: The input name `inputs` is deprecated; use `input_features` instead.
Whisper did not predict an ending timestamp. Unexpected behavior may occur.
Output: [50.94-Nones] Auto-d . . . . and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and associated foll associated foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll
Expected Behavior:
The fine-tuned model should produce coherent transcriptions based on the dataset.
Request for Assistance:
What might cause repetitive outputs during inference?
Could the warnings during inference indicate an issue with training or merging?
Any suggestions for resolving this problem?
The text was updated successfully, but these errors were encountered:
Description
After fine-tuning the Whisper model, the transcription output contains repetitive and incoherent results, such as "and and and..." and "foll foll foll...". Below are the dataset details, training logs, and inference outputs.
-Dataset Details:
{"audio": {"path": "011311ce3e.wav"}, "sentence": "Comparison is made to previous <UNK>", "language": "English", "sentences": [{"start": 0.0, "end": 2.09, "text": "Comparison is made to previous "}], "duration": 31.0}
Training Details:
Freezing:
python merge_lora.py --lora_model=2024_11_26/whisper-large-v3/checkpoint-16000/ --output_dir=_whisper_Finetune/en/2024_11_26/whisper-large-v3/frezed_checkpoint_16000
Inference Details
python infer.py --audio_path=testset/en-AU/Medical/SonicHealth/male/MB3/Sound/9c294857-13d4-46ab-91b7-15debf011872.wav --model_path=_whisper_Finetune/en/2024_11_26/whisper-large-v3/frezed_checkpoint_16000/whisper-large-v3-finetune --use_gpu True --language English
Warnings:
Output:
[50.94-Nones] Auto-d . . . . and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and associated foll associated foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll foll
Expected Behavior:
The fine-tuned model should produce coherent transcriptions based on the dataset.
Request for Assistance:
The text was updated successfully, but these errors were encountered: