Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Push to Hub functionnality to Model and Pipeline #1699

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from

Conversation

kamilakesbi
Copy link

@kamilakesbi kamilakesbi commented Apr 29, 2024

Hi @hbredin,

I've started working on adding a push_to_hub method to both Model and Pipeline classes. It will hopefully help users push their custom pyannote speaker-segmentation and speaker-embedding models to the Hugging Face Hub, and use them within custom speaker-diarization pipelines.

In this PR, I've added:

1. A push_to_hub method to the base Model class:

The method is compatible with both the pyannote PyanNet segmentation model and WeSpeakerResNet34speaker embedding model. It will:

  1. Save the state dict in a pytorch_model.bin file.

  2. Write a config.yaml file similar to the one of pyannote/segmentation-3.0 or pyannote/wespeaker-voxceleb-resnet34-LM.

  3. Write a minimal Readme file (which we can work on together), and add appropriate tags and licence.

I've tested the method using the following scripts:

  • Segmentation Model:
from pyannote.audio import Model

segmentation_model = Model.from_pretrained("pyannote/segmentation-3.0")
segmentation_model.push_to_hub('kamilakesbi/speaker-segmentation-test')

Here is the result :)

Note: I've used the diarizers library here to first load a fine-tuned speaker segmentation model from the Hugging Face Hub, convert it to a pyannote format, and push it to the Hub.

  • Speaker Embedding Model:
from pyannote.audio import Model

speaker_embedding_model = Model.from_pretrained('pyannote/wespeaker-voxceleb-resnet34-LM')
speaker_embedding_model.push_to_hub('kamilakesbi/speaker-embedding-test')

Here is the result :)

2. A push_to_hub method to the base Pipeline class:

  • Here, it will generate the config file associated to the pipeline, modify the embedding model and segmentation model by using specified ones from the Hub, and push the updated config file to the hub.
  • I added the possibility to push the model checkpoints associated with the pipeline or just the pipeline config file with pointers to the model's hub repositories.
  • It will also push a minimal readme with tags and licence (again, we can work on it to customize the output).

The method can be used like this:

from pyannote.audio import Pipeline

pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-3.1')
pipeline.embedding = 'kamilakesbi/speaker-segmentation-test'
pipeline.segmentation = 'kamilakesbi/speaker-segmentation-test'
pipeline.push_to_hub('kamilakesbi/spd_pipeline_test')

We can also push the model's checkpoints using:

pipeline.push_to_hub('kamilakesbi/spd_pipeline_test', save_checkpoints=True)

Note that this is still a work in progress :) I can make changes to the code and adapt it to pyannote's needs!

Hope that this PR will be useful to pyannote.

Copy link

@sanchit-gandhi sanchit-gandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General functionality looks good! Left some design thoughts on how we can improve ease-of-use (e.g. saving the embedding/segmentation models when we push the pipeline)

pyannote/audio/core/model.py Outdated Show resolved Hide resolved
pyannote/audio/core/model.py Outdated Show resolved Hide resolved
pyannote/audio/core/model.py Outdated Show resolved Hide resolved
pyannote/audio/core/pipeline.py Outdated Show resolved Hide resolved
segmentation_model = self.segmentation_model

# Modify the config with new segmentation and embedding models:
config["pipeline"]["params"]["embedding"] = embedding_model
Copy link

@sanchit-gandhi sanchit-gandhi Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline: an elegant solution would be to save both the embedding and segmentation models to subfloders in the repo (embedding and segmentation respectively), and then load the weights from these subfolders when we call .from_pretrained

Your repo structure for kamilakesbi/spd_pipeline_test could look something like the following:

├── config.yaml                  <- Top-level pipeline config
├── embedding                    <- Subfolder for the embedding model
|   ├── config.yaml             
|   ├── pytorch_model.bin
├── segmentation                 <- Subfolder for the segmentation model
|   ├── config.yaml             
|   ├── pytorch_model.bin

And your top-level yaml file could have an extra entry:

    embedding: kamilakesbi/spd_pipeline_test
    embedding_subfolder: embedding
    ...
    segmentation: kamilakesbi/spd_pipeline_test
    segmentation_subfolder: segmentation

Note that this would require updating .from_pretrained to handle this extra subfolder logic

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I handled this differently:

  • If we want to save the checkpoints, we add a save_checkpoints=True parameter to pipeline.push_to_hub. We would then get a repo structure like the one you proposed @sanchit-gandhi, but the top yaml file would look like this:
checkpoints: True
params: 
- embedding: 'kamilakesbi/speaker-embedding-test'
- embedding: 'kamilakesbi/speaker-segmentation-test'
  • If we don't want to store checkpoints on the hub, then we need pointers to the segmentation and embedding models on the hub. In this case the config file would look like this:
checkpoints: False
params: 
+ embedding: 'kamilakesbi/speaker-embedding-test'
+ embedding: 'kamilakesbi/speaker-segmentation-test'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, saving all sub-models means the model repo on the Hub is fully portable -> users can clone the repository and have all sub-models available to them locally

This is the design that was adopted for diffusers pipelines and has worked very well

Thus, we'll assume that any new checkpoints being pushed will follow this new repo structure, with an exception for current pipelines on the Hub that leverage components from multiple repositories

pyannote/audio/core/model.py Show resolved Hide resolved
pyannote/audio/core/pipeline.py Outdated Show resolved Hide resolved
Copy link

@sanchit-gandhi sanchit-gandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a few suggestions regarding the save structure. Can we add relevant tests as well? (both for the model and the pipeline)

pyannote/audio/core/model.py Show resolved Hide resolved
pyannote/audio/core/model.py Outdated Show resolved Hide resolved
repo_type="model",
)

model_type = str(type(self)).split("'")[1].split(".")[-1]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there not a model attribute or config param we can use to get this in a more robust way?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'm aware of... but it would be great!

pyannote/audio/core/model.py Outdated Show resolved Hide resolved
segmentation_model = self.segmentation_model

# Modify the config with new segmentation and embedding models:
config["pipeline"]["params"]["embedding"] = embedding_model

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, saving all sub-models means the model repo on the Hub is fully portable -> users can clone the repository and have all sub-models available to them locally

This is the design that was adopted for diffusers pipelines and has worked very well

Thus, we'll assume that any new checkpoints being pushed will follow this new repo structure, with an exception for current pipelines on the Hub that leverage components from multiple repositories

@kamilakesbi kamilakesbi changed the title [Work In Progress] - Add Push to Hub functionnality to Model and Pipeline Add Push to Hub functionnality to Model and Pipeline May 23, 2024
@kamilakesbi
Copy link
Author

Hi @hbredin,

It would be nice if you had time to do a review on this PR so that we can iterate on it :)

Thank you!

@hbredin
Copy link
Member

hbredin commented May 28, 2024

Apologies, I will eventually have a look it but I really don't have the bandwidth right now.

@sanchit-gandhi
Copy link

Hey @hbredin! No rush on reviewing this PR, whenever you get the chance we'd love to hear your feedback on the proposed changes! Otherwise, is there another maintainer who could give a quick review in the meantime?

@hbredin
Copy link
Member

hbredin commented Jun 14, 2024

Hey @sanchit-gandhi. I understand the frustration but I am actually the sole maintainer and also have many other hats. I am doing my best but have other priorities right now (like the upcoming 3.3.0 release with speech separation support).

Comment on lines +136 to +141
# If hub repo contains subfolders, load models and pipeline:
embedding = Model.from_pretrained(model_id, subfolder="embedding")
segmentation = Model.from_pretrained(model_id, subfolder="segmentation")
pipeline = Klass(**params)
pipeline.embedding = embedding
pipeline.segmentation_model = segmentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seem way too specific to pyannote/speaker-diarization-3.1.
I'd like to find a better (= more generic) way

We could do something like preprend subfolders by @model (or anything that makes sense) to indicate to Pipeline.from_pretrained that a model should be loaded from corresponding subfolders.

pipeline:
  name: pyannote.audio.pipelines.SpeakerDiarization
  params:
    clustering: AgglomerativeClustering
    embedding: @model/embedding 
    segmentation: @model/segmentation

Similarly, we could use @pipeline/ to load sub-pipelines, and later @whatever if we ever want to add new pyannote stuff (I already have one in mind that I cannot really talk about right now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants