Skip to content

Latest commit

 

History

History
83 lines (57 loc) · 1.88 KB

README.md

File metadata and controls

83 lines (57 loc) · 1.88 KB

EnCodec

An example of Meta's EnCodec model in MLX.1 EnCodec is used to compress and generate audio.

Setup

Install the requirements:

pip install -r requirements.txt

Optionally install FFmpeg and SciPy for loading and saving audio files, respectively.

Install FFmpeg:

# on macOS using Homebrew (https://brew.sh/)
brew install ffmpeg

Install SciPy:

pip install scipy

Example

An example using the model:

import mlx.core as mx
from utils import load, load_audio, save_audio

# Load the 48 KHz model and preprocessor.
model, processor = load("mlx-community/encodec-48khz-float32")

# Load an audio file
audio = load_audio("path/to/aduio", model.sampling_rate, model.channels)

# Preprocess the audio (this can also be a list of arrays for batched
# processing).
feats, mask = processor(audio)

# Encode at the given bandwidth. A lower bandwidth results in more
# compression but lower reconstruction quality.
@mx.compile
def encode(feats, mask):
    return model.encode(feats, mask, bandwidth=3)

# Decode to reconstruct the audio
@mx.compile
def decode(codes, scales, mask):
    return model.decode(codes, scales, mask)


codes, scales = encode(feats, mask)
reconstructed = decode(codes, scales, mask)

# Trim any padding:
reconstructed = reconstructed[0, : len(audio)]

# Save the audio as a wave file
save_audio("reconstructed.wav", reconstructed, model.sampling_rate)

The 24 KHz, 32 KHz, and 48 KHz MLX formatted models are available in the Hugging Face MLX Community in several data types.

Optional

To convert models, use the convert.py script. To see the options, run:

python convert.py -h

Footnotes

  1. Refer to the arXiv paper and code for more details.