Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
extract_embedding.py
Description:
This script extracts audio embeddings from a set of audio files using a pre-trained ONNX model. It processes the audio files to convert them into feature representations, which are then fed into the model to obtain embeddings. The script supports multi-threading for efficient processing of multiple audio files.
Key Features:
wav.scp
file and maps them to speaker identifiers from a correspondingutt2spk
file.utt2embedding.pt
for individual utterance embeddings andspk2embedding.pt
for averaged speaker embeddings.Usage:
Arguments:
--dir
: The directory containing the input files (wav.scp
andutt2spk
).--onnx_path
: The path to the ONNX model file used for generating embeddings.--num_thread
: (Optional) The number of threads to use for parallel processing. Defaults to 8.Dependencies:
torch
: For handling tensors and saving embeddings.torchaudio
: For loading audio files and processing them.onnxruntime
: For running the ONNX model.torchaudio.compliance.kaldi
: For extracting Mel-frequency features.tqdm
: For displaying progress during processing.Output:
The script generates two files in the specified directory:
utt2embedding.pt
: A PyTorch tensor containing embeddings for each utterance.spk2embedding.pt
: A PyTorch tensor containing averaged embeddings for each speaker.