Spectral features are widely used in machine and deep learning research. The code shows implementations of basic features for voice/speech analysis.
SpectralEntropy
SpectralCentroid
SpectralSpread
SpectralSkewness
SpectralKurtosis
SpectralRolloffPoint
SpectralCrest
- (WIP)
SpectralFlux
SpectralSlope
SpectralFlatness
- (WIP) Ma, Y., Nishihara, A. Efficient voice activity detection algorithm using
long-term spectral flatness
measure. J AUDIO SPEECH MUSIC PROC. 2013, 87 (2013).
The code highly refers to the Matlab
tutorial, and uses Pytorch
and some functions of Speechbrain
to achieve.
git clone https://github.com/BrownsugarZeer/SpectralFeatures.git
python -m venv venv
venv\Scripts\activate.bat
pip install -r requirements.txt
My file path of waveform is <path_to_matlab>\MATLAB\R2019b\toolbox\audio\samples\Counting-16-44p1-mono-15secs.wav
and has been downsampled from 44100 Hz to 16000 Hz.
- Spectral Entropy
x, fs = torchaudio.load("Counting-16-44p1-mono-15secs_16000.wav")
compute_feat = SpectralEntropy(sample_rate=fs)
spectr_feat = compute_feat(x)
plot_feature(x, fs, spectr_feat)
- Spectral Centroid
x, fs = torchaudio.load("Counting-16-44p1-mono-15secs_16000.wav")
compute_feat = SpectralCentroid(sample_rate=fs)
spectr_feat = compute_feat(x)
plot_feature(x, fs, spectr_feat)
- Spectral Spread
x, fs = torchaudio.load("Counting-16-44p1-mono-15secs_16000.wav")
compute_feat = SpectralSpread(sample_rate=fs)
spectr_feat = compute_feat(x)
plot_feature(x, fs, spectr_feat)
- Spectral Skewness
x, fs = torchaudio.load("Counting-16-44p1-mono-15secs_16000.wav")
compute_feat = SpectralSkewness(sample_rate=fs)
spectr_feat = compute_feat(x)
plot_feature(x, fs, spectr_feat)
- Spectral Kurtosis
x, fs = torchaudio.load("Counting-16-44p1-mono-15secs_16000.wav")
compute_feat = SpectralKurtosis(sample_rate=fs)
spectr_feat = compute_feat(x)
plot_feature(x, fs, spectr_feat)
- Spectral Rolloff Point
x, fs = torchaudio.load("Counting-16-44p1-mono-15secs_16000.wav")
compute_feat = SpectralRolloffPoint(sample_rate=fs)
spectr_feat = compute_feat(x)
plot_feature(x, fs, spectr_feat)
- Spectral Crest
x, fs = torchaudio.load("Counting-16-44p1-mono-15secs_16000.wav")
compute_feat = SpectralCrest(sample_rate=fs)
spectr_feat = compute_feat(x)
plot_feature(x, fs, spectr_feat)
- Spectral Flux
x, fs = torchaudio.load("Counting-16-44p1-mono-15secs_16000.wav")
compute_feat = SpectralFlux(sample_rate=fs)
spectr_feat = compute_feat(x)
plot_feature(x, fs, spectr_feat)
- Spectral Slope
x, fs = torchaudio.load("Counting-16-44p1-mono-15secs_16000.wav")
compute_feat = SpectralSlope(sample_rate=fs)
spectr_feat = compute_feat(x)
plot_feature(x, fs, spectr_feat)
- Spectral Flatness
x, fs = torchaudio.load("Counting-16-44p1-mono-15secs_16000.wav")
compute_feat = SpectralFlatness(sample_rate=fs)
spectr_feat = compute_feat(x)
plot_feature(x, fs, spectr_feat)
- Long-Term Spectral Flatness
x, fs = torchaudio.load("Counting-16-44p1-mono-15secs_16000.wav")
compute_feat = LongTermSpectralFlatness(sample_rate=fs)
spectr_feat = compute_feat(x)
plot_feature(x, fs, spectr_feat)
The readability of the code will sometimes leads to lower performance..