Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High RAM usage with factors+shuffle-in-ram: false #979

Open
eltorre opened this issue Feb 20, 2023 · 0 comments
Open

High RAM usage with factors+shuffle-in-ram: false #979

eltorre opened this issue Feb 20, 2023 · 0 comments
Labels

Comments

@eltorre
Copy link

eltorre commented Feb 20, 2023

Bug description

We are trying to train a model with factors, but running into out of memory problems:

  • When running marian with data shuffling, the training uses ~90Gb of RAM, regardless of shuffle-in-ram.
  • Same model, disabled shuffling, it peaks at ~40Gb
  • The baseline SPM model, which uses exactly the same data but without factors, with shuffle-in-ram:false, peaks at ~25Gb

We are using factors-combine: sum, but not sure this has a large effect on RAM usage.

It seems marian is using significantly more RAM when shuffling data using factored models. Maybe it is ignoring shuffle-in-ram: false?

For reference, vocab+factors+valid entries stats, which looks OK to me:

[2023-02-10 14:56:47] [vocab] Loading vocab spec file ../wd.all2022.en-fr.en-fr/vocab.en.new.fsv
[2023-02-10 14:56:47] [vocab] Factor group '(lemma)' has 32000 members
[2023-02-10 14:56:47] [vocab] Factor group '|d' has 114 members
[2023-02-10 14:56:47] [vocab] Factor group '|s' has 4 members
[2023-02-10 14:56:47] [vocab] Factor group '|c' has 3 members
[2023-02-10 14:56:47] [vocab] Factored-embedding map read with total/unique of 127985/32121 factors from 32000 example words (in space of 73,602,300)
[2023-02-10 14:56:47] [vocab] Expanding all valid vocab entries out of 73,602,300...
[2023-02-10 14:57:11] [vocab] Completed, total 43769165 valid combinations
[2023-02-10 14:57:11] [data] Setting vocabulary size for input 0 to 43,769,165

Context

Marian v1.11.0 f00d062 2022-02-08 08:39:24 -0800

We also observed the same behaviour with rev. 3c2a432

CMake command:
cmake .. -DCMAKE_BUILD_TYPE=Release
-DUSE_SENTENCEPIECE=ON
-DCOMPILE_CPU=on
-DUSE_STATIC_LIBS=on
-DUSE_FBGEMM=on

Comments

As a side question (and sorry to mix it with the bug), the size of the expanded space is:
(32000+1)(114+1)(4+1)*(3+1)=73602300
To me it seems marian is reserving an extra vocab word for UNK on each factor, but this will not happen. Is there a flag to inhibit this behaviour?

Thanks a lot

@eltorre eltorre added the bug label Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant