Using 'num_speakers' in Meeting audio Diarization #1726
Unanswered
Daeinbangue
asked this question in
Q&A
Replies: 1 comment
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello.
I am trying to diarize a meeting audio(about 3 minutes) with 5 speakers using pyannote Speaker Diarization 3.1.
When I don't use 'num_speakers', it diarizes the audio into 7 speakers. Most of the speech is divided correctly, but the same speaker is split into different speaker mappings (e.g. speaker5 and speaker6 are the same person).
So when setting 'num_speakers = 5', it diarizes the other two speakers, who were originally well-mapped, as the same speaker, and the case of splitting the same speaker into different speakers still remains. (e.g. speaker2 & 4 combined as speaker 2).
Here is a part of my pipeline:
Question 1: Fundamentally, what difference does setting 'num_speakers' make? Does it simply simplify the clustering process, thus improving performance and processing time? I am wondering about Logic and principle.
Question 2: In cases of incorrect diarization and mapping as described above, is there a way to improve this without training? (e.g. hyperparameters, etc.)
It will be really helpful if anyone gives me any advice.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions