Gnmt train slow #668

yajiedesign · 2019-04-19T02:00:33Z

yajiedesign
Apr 19, 2019

GPU cannot be filled with train_gnmt.
GPU:GTX TITAN X
CPU:4770k
GPU LOAD: about 10%
CPU LOAD: one core full.
dataset: LCSTS
vocab size: 320783
num work: 0 (Trying to set it to 2 seems to improve, but it takes up too much memory (about six times as much as 0)).
batch size: 16

ymjiang · 2019-04-19T02:29:18Z

ymjiang
Apr 19, 2019
Collaborator

Batch size=16 seems not reasonable. Did you try to increase it?

0 replies

yajiedesign · 2019-04-19T03:17:13Z

yajiedesign
Apr 19, 2019
Author

@ymjiang There's no way to increase it any more. Now it takes up about 10G of gpu memory.

0 replies

szha · 2019-04-19T05:01:51Z

szha
Apr 19, 2019
Maintainer

cc @sxjscience. One way to work around the memory limit is to perform gradient accumulation. Several model zoo scripts already implement this logic, and it might be a good idea to add it to GNMT too.

0 replies

sxjscience · 2019-04-19T05:02:38Z

sxjscience
Apr 19, 2019
Maintainer

Will take a look at it.

0 replies

szhengac · 2019-04-19T08:19:55Z

szhengac
Apr 19, 2019
Maintainer

@zhreshold Using num_worker=2 requires six times more memory compared to num_worker=0 seems unreasonable. Do you have any idea?

0 replies

zhreshold · 2019-04-19T18:13:03Z

zhreshold
Apr 19, 2019
Maintainer

@szhengac using multi worker will enable pre-fetching batches, default is 2*batch size, and each worker will be fetching a batch simultaneously.

0 replies

yajiedesign · 2019-04-20T00:23:52Z

yajiedesign
Apr 20, 2019
Author

now.I saved gpu money by setting share_embed.and I increase batch size.it some improvements.

0 replies

szhengac · 2019-04-20T09:28:09Z

szhengac
Apr 20, 2019
Maintainer

@zhreshold but each input batch is simply an integer matrix, which shouldn't occupy too much gpu memory.

0 replies

zhreshold · 2019-04-20T20:11:06Z

zhreshold
Apr 20, 2019
Maintainer

@szhengac Did you mean gpu memory? Dataloader should never occupy gpu memory

0 replies

szhengac · 2019-04-21T02:34:01Z

szhengac
Apr 21, 2019
Maintainer

@zhreshold Oh I mean cpu memory.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gnmt train slow #668

{{title}}

Replies: 10 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Gnmt train slow #668

yajiedesign Apr 19, 2019

Replies: 10 comments

ymjiang Apr 19, 2019 Collaborator

yajiedesign Apr 19, 2019 Author

szha Apr 19, 2019 Maintainer

sxjscience Apr 19, 2019 Maintainer

szhengac Apr 19, 2019 Maintainer

zhreshold Apr 19, 2019 Maintainer

yajiedesign Apr 20, 2019 Author

szhengac Apr 20, 2019 Maintainer

zhreshold Apr 20, 2019 Maintainer

szhengac Apr 21, 2019 Maintainer

yajiedesign
Apr 19, 2019

ymjiang
Apr 19, 2019
Collaborator

yajiedesign
Apr 19, 2019
Author

szha
Apr 19, 2019
Maintainer

sxjscience
Apr 19, 2019
Maintainer

szhengac
Apr 19, 2019
Maintainer

zhreshold
Apr 19, 2019
Maintainer

yajiedesign
Apr 20, 2019
Author

szhengac
Apr 20, 2019
Maintainer

zhreshold
Apr 20, 2019
Maintainer

szhengac
Apr 21, 2019
Maintainer