According to funcwj's uPIT, the training code supporting multi-gpu is written, and the Dataloader is reconstructed.
Demo Pages: Results of pure speech separation model
- Support Multi-GPU Training
- Use the Dataloader Method That Comes With Pytorch
- Provide Pre-Training Models
- Pytorch==1.3.0
- tqdm==4.32.1
- librosa==0.7.1
- scipy==1.3.0
- numpy==1.16.4
- PyYAML==5.1.1
-
Generate dataset using create-speaker-mixtures.zip with WSJ0 or TIMI
-
Prepare scp file(The content of the scp file is "filename path")
python create_scp.py
-
Prepare cmvn(Cepstral mean and variance normalization (CMVN) is a computationally efficient normalization technique for robust speech recognition.).
#Calculated by the compute_cmvn.py script: python compute_cmvn.py ./tt_mix.scp ./cmvn.dict
-
Modify the contents of yaml, mainly to modify the scp address, cmvn address. At the same time, the number of num_spk in run_pit.py is modified.
-
Training:
sh train.sh
-
Inference:
sh test.sh
- Kolbæk M, Yu D, Tan Z H, et al. Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2017, 25(10): 1901-1913.
- https://github.com/funcwj/uPIT-for-speech-separation