Questions about the model performance #3

YiyuLuo · 2019-07-23T05:52:21Z

Hi! Thank you very much for your great work!
I'm also working on this paper these days, but it takes a long time to train the model exactly the same as in the paper. I haven't got good results so far.
I find you change some layers of the network. I'm also wondering whether it is possible to build a small model while keeping the performance. Could you please tell me how's the perfomance of your modified model?

RemiRigal · 2019-11-08T15:24:12Z

Hi @YiyuLuo,

I'm not the author of the repository but I'm currently implementing a PyTorch version of the network described in the paper.

Regarding the size of the model I think it is reasonable to decrease the size of some layers. An interesting part of the paper is the ablation study:

It shows that some parts of the model are not gainful such as the fully connected layers.
Considering only the magnitude mask for the audio is also quite relevant, it decreases the input size by a factor of 2 and the loss is small.

I think that reducing the three FC layers to one FC layer of 100 units (as done by @mayurnewase) can not be sufficient enough to retain the whole complexity of the output masks, though.

YiyuLuo · 2019-11-09T02:34:53Z

Hi @YiyuLuo,

I'm not the author of the repository but I'm currently implementing a PyTorch version of the network described in the paper.

Regarding the size of the model I think it is reasonable to decrease the size of some layers. An interesting part of the paper is the ablation study:

It shows that some parts of the model are not gainful such as the fully connected layers.
Considering only the magnitude mask for the audio is also quite relevant, it decreases the input size by a factor of 2 and the loss is small.

I think that reducing the three FC layers to one FC layer of 100 units (as done by @mayurnewase) can not be sufficient enough to retain the whole complexity of the output masks, though.

Thanks for your reply!
I tried dispensing three FC layers in an Audio-only model of 2 speakers. However, the performance was bad, different from the results reported in paper.

RemiRigal · 2019-11-11T20:30:27Z

I tried dispensing three FC layers in an Audio-only model of 2 speakers. However, the performance was bad, different from the results reported in paper.

What was the size of your three FC layers ?

YiyuLuo · 2019-11-12T01:14:18Z

I tried dispensing three FC layers in an Audio-only model of 2 speakers. However, the performance was bad, different from the results reported in paper.

What was the size of your three FC layers ?

the same as the paper, 600 units each

RemiRigal · 2019-11-12T14:25:31Z

I tried dispensing three FC layers in an Audio-only model of 2 speakers. However, the performance was bad, different from the results reported in paper.

What was the size of your three FC layers ?

the same as the paper, 600 units each

How much of the AVSpeech dataset did you use ? I don't have as good results as they do in the paper but they are quite satisfying and I use a lighter model with only 15% of their dataset.

YiyuLuo · 2019-11-13T01:27:00Z

I tried dispensing three FC layers in an Audio-only model of 2 speakers. However, the performance was bad, different from the results reported in paper.

What was the size of your three FC layers ?

the same as the paper, 600 units each

How much of the AVSpeech dataset did you use ? I don't have as good results as they do in the paper but they are quite satisfying and I use a lighter model with only 15% of their dataset.

Due to some policy reasons, AVSpeech dataset is not available. I used GRID dataset instead, about 20,000 speech clips in total.

RemiRigal · 2019-11-14T10:39:18Z

Due to some policy reasons, AVSpeech dataset is not available. I used GRID dataset instead, about 20,000 speech clips in total.

I'm still able to download the AVSpeech dataset from this page. Is this website unavailable for you ?

YiyuLuo · 2019-11-14T11:17:48Z

This website is available but China mainland can't access YouTube.

saarthak-kapse · 2019-11-15T22:57:32Z

Hi @YiyuLuo,

I'm not the author of the repository but I'm currently implementing a PyTorch version of the network described in the paper.

Regarding the size of the model I think it is reasonable to decrease the size of some layers. An interesting part of the paper is the ablation study:

It shows that some parts of the model are not gainful such as the fully connected layers.
Considering only the magnitude mask for the audio is also quite relevant, it decreases the input size by a factor of 2 and the loss is small.

I think that reducing the three FC layers to one FC layer of 100 units (as done by @mayurnewase) can not be sufficient enough to retain the whole complexity of the output masks, though.

Hey I am also using Pytorch but not able to get results, can you help me. Can I get your gmail so that I can discuss it with you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the model performance #3

Questions about the model performance #3

YiyuLuo commented Jul 23, 2019

RemiRigal commented Nov 8, 2019

YiyuLuo commented Nov 9, 2019

RemiRigal commented Nov 11, 2019

YiyuLuo commented Nov 12, 2019

RemiRigal commented Nov 12, 2019

YiyuLuo commented Nov 13, 2019

RemiRigal commented Nov 14, 2019 •

edited

Loading

YiyuLuo commented Nov 14, 2019

saarthak-kapse commented Nov 15, 2019

Questions about the model performance #3

Questions about the model performance #3

Comments

YiyuLuo commented Jul 23, 2019

RemiRigal commented Nov 8, 2019

YiyuLuo commented Nov 9, 2019

RemiRigal commented Nov 11, 2019

YiyuLuo commented Nov 12, 2019

RemiRigal commented Nov 12, 2019

YiyuLuo commented Nov 13, 2019

RemiRigal commented Nov 14, 2019 • edited Loading

YiyuLuo commented Nov 14, 2019

saarthak-kapse commented Nov 15, 2019

RemiRigal commented Nov 14, 2019 •

edited

Loading