xlm-roberta and Mistral-7B take significant amounts of memory during compilation #2821

cjvolzka · 2024-05-09T00:56:51Z

While compiling models like HuggingFace protectai/xlm-roberta-base-language-detection-onnx or mistralai/Mistral-7B-v0.1 I notice we take significantly larger amounts of memory than the entire model size during compiling.

For example, the xlm-roberta-base-language-detection-onnx is about 1.11GB but during compile time I see peaks up to 9GB of memory used by onnx-mlir, opt and llc compiling with --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --onnx-op-stats TXT.

The Mistral-7B-v0.1 model is about 29GB but during compile time I see peaks up to 70+Gb and sustained 58GB memory compiling with --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --store-constants-to-file --onnx-op-stats TXT

Is there anything that can be done to reduce the compile time memory required for these kind of models?

The text was updated successfully, but these errors were encountered:

imaihal · 2024-05-10T08:05:50Z

@cjvolzka How can we get onnx model for Mistral-7B-v0.1 ?

cjvolzka · 2024-05-28T20:44:45Z

@imaihal Sorry, I missed your question. Below is how I generated the Mistral onnx model.

Notes:

I exported the model using my Mac as the tools don't support s390x. Afterward, I transferred the folder it created (with the onnx file and constants) to the s390x host to compile the model.
the huggingface-cli comand will ask a couple of questions:
- Use https://huggingface.co/settings/tokens to generate the token it requests
- You don't need to add the token as a git credential

pip install huggingface_cli optimum
huggingface-cli login
optimum-cli export onnx --model mistralai/Mistral-7B-v0.1 --framework pt --atol 0.001 --task text-generation Mistral-7B-v0.1-text-generation

* Write a constant value to single file without buffering to remove spikes in memory consumption. This PR solves an issue of memory consumption reported in #2821 We found that there is a spike in memory consumption when writing a constant into a file (model.constants.bin). This is because all constants into a buffer once and write it to a file at once. This PR changes to write the constant to the file without the buffering. This removes the spike in memory consumption. --------- Signed-off-by: Haruki Imai <[email protected]> Co-authored-by: Tung D. Le <[email protected]>

cjvolzka changed the title ~~Models take significant amounts of memory to compile~~ xlm-roberta and Mistral-7B take significant amounts of memory during compilation May 9, 2024

imaihal mentioned this issue Jul 12, 2024

Write a constant value to a file without buffering #2874

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xlm-roberta and Mistral-7B take significant amounts of memory during compilation #2821

xlm-roberta and Mistral-7B take significant amounts of memory during compilation #2821

cjvolzka commented May 9, 2024

imaihal commented May 10, 2024

cjvolzka commented May 28, 2024

xlm-roberta and Mistral-7B take significant amounts of memory during compilation #2821

xlm-roberta and Mistral-7B take significant amounts of memory during compilation #2821

Comments

cjvolzka commented May 9, 2024

imaihal commented May 10, 2024

cjvolzka commented May 28, 2024