This project utilizes the Whisper Automatic Speech Recognition (ASR) model developed by OpenAI to transcribe audio from both a locally hosted video and a YouTube video. The transcription results are saved as text files for further analysis.
- OpenAI Whisper: A powerful ASR model for transcribing speech.
- pytube: A lightweight, dependency-free Python library to download YouTube videos.
- moviepy: A video editing library for Python.
- ffmpeg: A multimedia framework to handle audio and video processing.
-
Install the required dependencies:
pip install -U openai-whisper pip install ffmpeg pip install pytube pip install moviepy
-
Load the Whisper ASR model:
import whisper model = whisper.load_model("tiny")
-
Run the provided Python script:
-
Local Video Transcription:
from moviepy.editor import VideoFileClip video = VideoFileClip("video.mp4") video.audio.write_audiofile("myaudio.mp3") result_local = model.transcribe("myaudio.mp3", fp16=False) with open("mySound_local.txt", "w") as file: file.write(result_local["text"]) print(result_local["text"])
-
YouTube Video Transcription:
from pytube import YouTube yt = YouTube('https://www.youtube.com/watch?v=Aq92xxwYwSU') stream = yt.streams.get_highest_resolution() video_path = stream.download() video_youtube = VideoFileClip(video_path) video_youtube.audio.write_audiofile("myaudio.mp3") result_youtube = model.transcribe("myaudio.mp3", fp16=False) with open("mySound_youtube.txt", "w") as file: file.write(result_youtube["text"]) print(result_youtube["text"])
-
-
Analyze the transcriptions saved in the generated text files (
mySound_local.txt
andmySound_youtube.txt
).