How to train with my own data #1

todalex · 2021-03-04T15:05:08Z

Hello everyone
I want to start a training for my own data and i ran into problems
I used audioDataloader to make a dataset with torch from the link below
https://github.com/muhdhuz/audioDataloader
and then passed dataset_A and B to NonParallelSpecDataset
after all these works i call tha train(args_scpt, datamodule) but its not working
pleas help me use my own data and tell me what should i do from beginning

tarepan · 2021-03-04T19:53:43Z

@todalex thanks for trying this repo!
This notebook is for custom dataset.
https://github.com/tarepan/Scyclone-PyTorch/blob/main/Scyclone_PyTorch_other_dataset.ipynb
I am happy if this help you.

todalex · 2021-03-04T21:38:28Z

@tarepan thanks for you're answer
I am using the notebook you mentioned and in Dataset Preparation part i have problem because i only have some audio file's and i dont know how to make a dataset out of it and have the same object that you create with NpVCC2016_spec and as i said i followed you're explanation
Input datum: linear spectrogram (n_fft=254, from 16kHz waveform), described in original paper
Input class: PyTorch-Lightning's DataModule
You should prepare PyTorch's torch.utils.data.dataset.Dataset which yield the spectrogram by yourself.
Below code wrap two (non-parallel) datasets within DataModule.
but i think i did'nt do it right

here is the code i used to make a dataset from my audio data:

class AudioDataset(data.Dataset):

def __init__(self, sr, seqLen, stride, csvfile=None, datadir=None, extension=None, paramdir=None, prop=None, transform=None, param_transform = None, target_transform=None):

	if stride < 1:
		raise ValueError("stride has to be >= 1")
	
	if csvfile is not None:
		assert datadir is None, "Input either csvfile or data directory - Not both!"
		assert extension is None, "Do not input extension if csvfile is used!"
		self.filelist = parse_playlist(csvfile)
	elif datadir is not None:
  
		assert extension is not None, "Please input a file extension to use!"
		self.filelist,self.fnamelist  = listDirectory_all(directory=datadir,fileExtList=extension)
		
	else:
		raise ValueError("Please input either a csvfile or data directory to read from!")
		
	self.datadir = datadir
	self.paramdir= paramdir
	self.prop = prop
	self.fileLen,self.fileDuration,self.totalFileDuration,self.totalSamples,self.srInSec,self.seqLenInSec = dataset_properties(self.filelist,sr,seqLen)
	self.sr = sr
	self.seqLen = seqLen
	self.stride = stride
	self.transform = transform
	self.param_transform = param_transform
	self.target_transform = target_transform
	self.indexLen = create_sampling_index(self.totalSamples,self.stride)
		
def __getitem__(self,index):
	chooseFileIndex,startoffset = choose_sequence_notsame(index+1,self.fileDuration,self.srInSec,self.stride)
	whole_sequence = load_sequence(self.filelist,chooseFileIndex,startoffset,self.seqLen,self.sr)
	while whole_sequence is None: #if len(whole_sequence) < self.seqLen+1, pick another random section
		index = np.random.randint(self.indexLen)
		chooseFileIndex,startoffset = choose_sequence_notsame(index+1,self.fileDuration,self.srInSec,self.stride)
		whole_sequence = load_sequence(self.filelist,chooseFileIndex,startoffset,self.seqLen,self.sr)
	assert len(whole_sequence) == self.seqLen+1, str(len(whole_sequence))
	whole_sequence = whole_sequence.reshape(-1,1)
	sequence = whole_sequence[:-1]
	target = whole_sequence[1:]
	if self.transform is not None:
		input = self.transform(sequence)
	if self.target_transform is not None:
		target = self.target_transform(target)
	if self.paramdir is not None:
		pm = paramManager.paramManager(self.datadir, self.paramdir)
		params = pm.getParams(self.filelist[chooseFileIndex]) 
		paramdict = pm.resampleAllParams(params,self.seqLen,startoffset,startoffset+self.seqLenInSec,self.prop,verbose=False)
		if self.param_transform is not None:
			paramtensor = self.param_transform(paramdict)
			#print("param",paramtensor.shape)
			#input = {**input,**paramtensor}  #combine audio samples and parameters here
			input = torch.cat((input,paramtensor),1)  #input dim: (batch,seq,feature)	
	else:
		if self.transform is None:
			input = sequence
			
	return input,target

I do create my Dataset_A and Dataset_B with this implementation of torch.utils.data.Dataset
and after that i run the remaining cell's to the Train part but there is an error in trainer.fit method

dataset_full = NonParallelSpecDataset()
53
---> 54 mod = n_full % self.batch_size
55 self.dataset_train, self.dataset_val = random_split(
56 dataset_full, [len(dataset_full) - n_data_val, n_data_val]

NameError: name 'n_full' is not defined

I think problem is with my implementation of Dataset class
i there any better whay that i could create whatever this project needs as dataset_A and dataset_B from my own audio file's ?

tarepan · 2021-03-04T22:45:10Z

OK, I understand the situation.

Can I check our preassumption?
If you run the original notebook (https://github.com/tarepan/Scyclone-PyTorch/blob/main/Scyclone_PyTorch_other_dataset.ipynb), does it run correctly?

todalex · 2021-03-05T09:19:05Z

@tarepan with some changes,yes i changed the dataset download part because it had some error so i changed it like this:

NpVCC2016_spec(train=True, download_corpus=True, speakers=["SF1"])

and in the next cell i had some error for Dataset and LightningDataModule so i imported them befor that cell :

from torch.utils.data import Dataset
from pytorch_lightning import LightningDataModule
from typing import Optional
from os import cpu_count

and after importing ,the things i copied the DataLoaderPerformance class that is needed in NonParallelSpecDataModule:

class DataLoaderPerformance:
"""PyTorch DataLoader performance configs.

All attributes which affect performance of [torch.utils.data.DataLoader][^DataLoader] @ v1.6.0
[^DataLoader]:https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
"""

def __init__(self, num_workers: Optional[int] = None, pin_memory: bool = True) -> None:
    """Default: num_workers == cpu_count & pin_memory == True
    """

    # Design Note:
    #   Current performance is for single GPU training.
    #   cpu_count() is not appropriate under the multi-GPU condition.

    if num_workers is None:
        c = cpu_count()
        num_workers = c if c is not None else 0
    self.num_workers: int = num_workers
    self.pin_memory: bool = pin_memory

and in the next cell because i was unable to have the argparser part i called the DataLoaderPerformance like this:

loader_perf = DataLoaderPerformance(
None, True
)

in the last cell for training i am running in colab so the import part will have error because of the sys path so i mounted my drive and added the sys path:
import sys
sys.path.insert(3, '/content/drive/MyDrive/Scyclone-PyTorch-main/scyclonepytorch')

after all it worked and started traininng

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train with my own data #1

How to train with my own data #1

todalex commented Mar 4, 2021

tarepan commented Mar 4, 2021

todalex commented Mar 4, 2021 •

edited

Loading

tarepan commented Mar 4, 2021

todalex commented Mar 5, 2021 •

edited

Loading

How to train with my own data #1

How to train with my own data #1

Comments

todalex commented Mar 4, 2021

tarepan commented Mar 4, 2021

todalex commented Mar 4, 2021 • edited Loading

tarepan commented Mar 4, 2021

todalex commented Mar 5, 2021 • edited Loading

todalex commented Mar 4, 2021 •

edited

Loading

todalex commented Mar 5, 2021 •

edited

Loading