SpecUFEx

SpecUFEx stands for "Unsupervised Spectral Feature Extraction", an unsupervised machine learning algorithm to characterize time variations in spectral content of waveform data. We apply the method to earthquake seismograms. SpecUFEx combines probabilistic non-negative matrix factorization (NMF) and hidden Markov modeling (HMM) of spectrograms (short time Fourier transforms of waveform data) to generate "fingerprints", low dimensional representations of spectral variation through time. Both the NMF and HMM models are fit using stochastic variational inference; the method is therefore scalable to tens or hundreds of thousands of spectrograms. The resulting fingerprints can be used as features for either unsupervised (e.g. clustering) or supervised (e.g. classification) machine learning. The method is described in

Holtzman, B.K., Paté, A., Paisley, J., Waldhauser, F., Repetto, D.: Machine learning reveals cyclic changes in seismic source spectra in geysers geothermal field. Science advances 4(5) (2018)

Please cite this article if you use the package for research purposes.

This repository is a python port of Matlab code originally written by John Paisley at Columbia University. All python code included is written by Ben Holtzman, Theresa Sawi and Nate Groebner.

Installation

Clone this repository to your computer, cd to the directory, and use pip to install.

git clone https://github.com/specufex/specufex.git
cd specufex
pip install .

Alternatively, a Dockerfile is included that builds a container running Jupyterlab with an environment setup for SpecUFEx. A prebuilt container is available through the SCOPED project here. Or you can directly pull the container if you have Docker:

docker pull ghcr.io/seisscoped/specufex:latest

Usage

Fitting models and transforming data

SpecUFEx fits a group of $D x M$ spectrograms, where D is the number of rows (frequency bands) and M is the number of columns (timesteps) in each spectrogram. The spectrograms must be in a numpy-compatible matrix of dimension $N x D x M$, N being the number of spectrograms in the dataset. Each spectrogram must consist of all nonnegative (>=0) entries. (Note, this is not yet checked for.)

The two main classes in this package are BayesianNonparametricNMF and BayesianHMM. Each has fit, transform, and fit_transform methods to be consistent with the Scikit-learn API style.

The first step is to calculate the nonnegative matrix factorization of your data. This is done by creating a new BayesianNonParametricNMF object and calling its fit method. This function estimates the model parameters based on all of the data in X. In the future, we hope to create a convergence criterion based on the ELBO. We iteratively fit the model, one spectrogram at a time, selecting a random spectrogram from our data set. In the example below, X is the numpy matrix of spectrograms. Please note that your data must be nonnegative; i.e., all elements must be >= 0.

nmf = BayesianNonparametricNMF(X.shape)

batches = 10000
batch_size = 1

for i in range(batches):
    idx = np.random.randint(X.shape[0], size-batch_size)
    nmf.fit(X[idx])

This finds the left matrix of the NMF of the data. Transform the data to the reduced representation, Hs, (the right matrix) via

Vs = nmf.transform(X)

Pro tip: a step can be saved by the convenience method `fit_transform, which does the fitting and transformation in one command. Note, however, that this can take a long time, so you may want to do this in pieces so you can save the resulting NMF left matrix in case something goes wrong (like a power outage).

Next, fit the HMM model with the BayesianHMM class. Currently, in order to setup the object correctly the number of NMF patterns (num_pat) and the gain calculated by BayesianNonparametricNMF are passed to the constructor.

hmm = BayesianHMM(nmf.num_pat, nmf.gain)

batches = 10000
batch_size = 1

for i in range(batches):
    idx = np.random.randint(Vs.shape[0], size-batch_size)
    hmm.fit(Vs[idx])

Similar to the NMF calculation, the data are transformed to fingerprints with the transform function.

fingerprints, As, gams = hmm.transform(Vs)

Or, if you want to save a step, use fit_transform like above.

fingerprints, As, gams = hmm.fit_transform(Vs, nmf.EW)

The variable fingerprints has the calculated fingerprints (the ultimate matrices of interest), As has the state transition matrices of each spectrogram, and gams are the state sequence probability matrices.

Saving and loading models

Once you have fit either the NMF or HMM model (or both!) you can save the parameters for the model using built in functions. From the above examples where nmf and hmm are objects that contain trained models, simply use

nmf.save(filename)
hmm.save(filename)

to save the parameters. Likewise, to load an already saved model and instatiate a new model object use

nmf = BayesianNonparametricNMF.load(filename)
hmm = BayesianHMM.load(filename)

and now you have NMF and HMM models that are ready to transform your data.

Development

If you are interested in contributing to SpecUFEx development, please fork this repository and create a new branch for your new code. Create a developemnt environment with conda or virtualenv and install the dev dependencies with pip install -r requirements-dev.txt. Code formatting is done with ruff and is performed on every commit with pre-commit. Please write tests for the code you develop. We use pytest and nox for writing and running tests. When your code is done, submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
.github/workflows		.github/workflows
docs		docs
specufex		specufex
tests		tests
tutorials		tutorials
.coverage		.coverage
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.setup.py		.setup.py
Dockerfile		Dockerfile
Dockerfile.tutorial		Dockerfile.tutorial
LICENSE		LICENSE
README.md		README.md
noxfile.py		noxfile.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
scaling_results.csv		scaling_results.csv
specufex_scaling.py		specufex_scaling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpecUFEx

Installation

Usage

Fitting models and transforming data

Saving and loading models

Development

About

Releases

Packages

Contributors 2

Languages

License

Specufex/specufex

Folders and files

Latest commit

History

Repository files navigation

SpecUFEx

Installation

Usage

Fitting models and transforming data

Saving and loading models

Development

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages