Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task] Extend the list of supported audio features #193

Open
5 tasks
fabiocat93 opened this issue Nov 15, 2024 · 6 comments
Open
5 tasks

[Task] Extend the list of supported audio features #193

fabiocat93 opened this issue Nov 15, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@fabiocat93
Copy link
Collaborator

fabiocat93 commented Nov 15, 2024

Description

We want to expand the audio features that senselab can extract. This will include:

  • Time-varying features: Features that provide detailed information about audio properties over time.
  • Aggregated features: Statistical or summarized representations of audio properties (e.g., min, max, range, mean pitch, standard deviation of feature X).

Tasks

  • Research and shortlist additional audio features (time-varying and aggregated).
  • Define how these features will be implemented in senselab.
  • Update senselab's feature extraction API to include the new features.
  • Write tests to ensure the correctness and reliability of the new features.
  • Update documentation to reflect the changes.
@fabiocat93 fabiocat93 added the enhancement New feature or request label Nov 15, 2024
@fabiocat93
Copy link
Collaborator Author

Let's start by using this space to discuss features of interest.

@fabiocat93
Copy link
Collaborator Author

fabiocat93 commented Nov 15, 2024

about time-varing feats: @satra (from a slack conv) suggested:

  • formants
  • pitch
  • intensity
  • loudness
  • glottal flow (derivative of)
  • vad (this should detect any vocalizations)
  • non-vocal fold source (aspiration/frication)

@fabiocat93 added (also inspired by @Rahul-Brito 's slides - of course, @Rahul-Brito feel free to add and comment further)

  • Energy
  • Perceptual Linear Predictive (PLP)
  • Linear Predictive Coding (LPC)
  • Line Spectral Pairs (LSP)
  • Spectral Shape descriptors (Spectral centroid, Spectral bandwidth, ...)
  • Harmonics-to-Noise Ratio (and maybe speech to noise ration - here is an interesting model for estimating it: https://huggingface.co/pyannote/brouhaha)

@fabiocat93
Copy link
Collaborator Author

also, it may be cool to have some functions for plotting (parselmouth) feats like in here

@fabiocat93
Copy link
Collaborator Author

Also, @skirdey open-sourced some code for audio metrics calculation of PESQ, CSIG, CBAK, COVL, STOI. It may be interesting to make those feats part of senselab (some we already have).

@satra
Copy link
Collaborator

satra commented Nov 18, 2024

audio metrics calculation of PESQ, CSIG, CBAK, COVL, STOI

pesq and stoi are already computed by torchaudio_squim. perhaps add the others to that section/workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
Status: Needs brainstorming
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants