Adds generation of songs with a length of over 30 seconds.
Adds the ability to continue songs.
Adds a seed option.
Adds ability to load locally downloaded models.
Adds training (Thanks to chavinlo's repo https://github.com/chavinlo/musicgen_trainer)
Adds MacOS support.
Adds queue (on the main-queue branch: https://github.com/1aienthusiast/audiocraft-infinity-webui/tree/main-queue)
Disables (hopefully) the gradio analytics.
Python 3.9 is recommended.
- Clone the repo:
git clone https://github.com/1aienthusiast/audiocraft-infinity-webui.git
- Install pytorch:
pip install 'torch>=2.0'
- Install the requirements:
pip install -r requirements.txt
- Clone my fork of the Meta audiocraft repo and chavinlo's MusicGen trainer inside the
repositories
folder:
cd repositories
git clone https://github.com/1aienthusiast/audiocraft
git clone https://github.com/chavinlo/musicgen_trainer
cd ..
If you already cloned the Meta audiocraft repo you have to remove it then clone the provided fork for the seed option to work.
cd repositories
rm -rf audiocraft/
git clone https://github.com/1aienthusiast/audiocraft
git clone https://github.com/chavinlo/musicgen_trainer
cd ..
python webui.py
Run git pull
inside the root folder to update the webui, and the same command inside repositories/audiocraft
to update audiocraft.
Meta provides 4 pre-trained models. The pre trained models are:
small
: 300M model, text to music only - 🤗 Hubmedium
: 1.5B model, text to music only - 🤗 Hubmelody
: 1.5B model, text to music and text+melody to music - 🤗 Hublarge
: 3.3B model, text to music only - 🤗 Hub
Needs a GPU!
I recommend 12GB of VRAM for the large model.
Create a folder, in it, place your audio and caption files. They must be WAV and TXT format respectively.
Place the folder in training/datasets/
.
Important: Split your audios in 35 second chunks. Only the first 30 seconds will be processed. Audio cannot be less than 30 seconds.
In this example, segment_000.txt contains the caption "jazz music, jobim" for wav file segment_000.wav
dataset_path
- path to your dataset with WAV and TXT pairs.model_id
- MusicGen model to use. Can besmall
/medium
/large
. Default:small
- model it will be finetuned onlr
: Float, learning rate. Default:0.0001
/1e-4
epochs
: Integer, epoch count. Default:5
use_wandb
: Integer,1
to enable wandb,0
to disable it. Default:0
= Disabledsave_step
: Integer, amount of steps to save a checkpoint. Default: None
Once training finishes, the model (and checkpoints) will be available under the models/
directory.
Model gets saved to models/ as lm_final.pt
- Place it in models/DIRECTORY_NAME/
- In the Inference tab choose
custom
as the model and enter DIRECTORY_NAME into the input field. - In the Inference tab choose the model it was finetuned on
For google colab you need to use the --share
flag.
- The code in this repository is released under the AGPLv3 license as found in the LICENSE file.