Using 'num_speakers' in Meeting audio Diarization #1726

Daeinbangue · 2024-06-18T05:37:31Z

Daeinbangue
Jun 18, 2024

Hello.

I am trying to diarize a meeting audio(about 3 minutes) with 5 speakers using pyannote Speaker Diarization 3.1.

When I don't use 'num_speakers', it diarizes the audio into 7 speakers. Most of the speech is divided correctly, but the same speaker is split into different speaker mappings (e.g. speaker5 and speaker6 are the same person).

So when setting 'num_speakers = 5', it diarizes the other two speakers, who were originally well-mapped, as the same speaker, and the case of splitting the same speaker into different speakers still remains. (e.g. speaker2 & 4 combined as speaker 2).
Here is a part of my pipeline:

# Load finetuned models
pipeline_man = SpeakerDiarization(
        segmentation=model_606,
        embedding=pipeline.embedding,
        embedding_exclude_overlap="True",
        clustering="AgglomerativeClustering",
        segmentation_dur=3.5
    )
pipeline_man.instantiate({
        "segmentation":{
        "min_duration_off":0.0
    },
    "clustering":{
        "method":"centroid",
        "min_cluster_size":8,
        "threshold":0.7045654963945799
    },
    })

Question 1: Fundamentally, what difference does setting 'num_speakers' make? Does it simply simplify the clustering process, thus improving performance and processing time? I am wondering about Logic and principle.

Question 2: In cases of incorrect diarization and mapping as described above, is there a way to improve this without training? (e.g. hyperparameters, etc.)

It will be really helpful if anyone gives me any advice.
Thank you.

hbredin · 2024-06-19T13:17:19Z

hbredin
Jun 19, 2024
Maintainer

Setting num_speakers does not simplify nor speed up the clustering process. Code for this part is available here.
As you experienced yourself, setting num_speakers to the actual number of speakers does not automatically lead to improved performance. In my experience, it is actually the opposite. Clustering is a difficult machine learning problem that has yet to be solved. Without training or designing completely new approaches, there is no obvious way towards improvement.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using 'num_speakers' in Meeting audio Diarization #1726

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Using 'num_speakers' in Meeting audio Diarization #1726

Daeinbangue Jun 18, 2024

Question 1: Fundamentally, what difference does setting 'num_speakers' make? Does it simply simplify the clustering process, thus improving performance and processing time? I am wondering about Logic and principle.

Question 2: In cases of incorrect diarization and mapping as described above, is there a way to improve this without training? (e.g. hyperparameters, etc.)

Replies: 1 comment

hbredin Jun 19, 2024 Maintainer

Daeinbangue
Jun 18, 2024

hbredin
Jun 19, 2024
Maintainer