This is a rough implementation of Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments (Ethan Manilow et al., ICASSP2020)
This implementation did not achieve as much performance as reported in the paper.
Precision | Recall | Accuracy | |
---|---|---|---|
Piano | 0.585 | 0.566 | 0.460 |
Bass | 0.797 | 0.817 | 0.747 |
Drums | 0.230 | 0.417 | 0.133 |
- Note : There's no benchmark dataset. These results are measured on data I randomly created using test set of Slakh2100 dataset. So It is not appropriate to quantitatively compare these results with those reported in the paper.
python inference.py hparams.yaml weight.ckpt input.wav output_dir/
- Get Slakh2100 dataset (See: Slakh2100 Project)
- Downsample audio to 16k
- Modify configs/config.yaml
data_dir: "/path/to/slakh2100_flac_16k/" # see: validation_epoch_end() in network/cerberus_wrapper.py sample_audio: path: "/path/to/sample/audio/sample_rate_16k.wav" offset: 1264000 num_frames: 160000
- Run training
python train.py
- Jongho Choi ([email protected])
- Jiwon Kim
- Ahyeon Choi