GPT Training Memory Estimation - NeMo Practice | Jianbin Chang #2577

2024-07-15T23:09:05Z

giscus[bot]
bot Jul 15, 2024

GPT Training Memory Estimation - NeMo Practice | Jianbin Chang

Nice blog.

https://shjwudp.github.io/blog/2023/gpt-training-memory-estimation-nemo-training-practice/

AlexBodner · 2024-07-15T23:09:06Z

AlexBodner
Jul 15, 2024 — with giscus

Hi Jianbin, awesome article, haven't found one that is more detailed! Do you mind to share the code for the memory estimation? Because I implemented it and I'm getting different results so I'm probably missing something.
Best wishes!
Alex

0 replies

sowmen · 2024-08-29T08:54:01Z

sowmen
Aug 29, 2024 — with giscus

What do tensor parallel size (t) and data parallel size (d) mean? For 1 node with 8xA6000 GPUs what would be the values?

0 replies

liviclee · 2024-11-13T16:58:02Z

liviclee
Nov 13, 2024 — with giscus

Good job, mark.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT Training Memory Estimation - NeMo Practice | Jianbin Chang #2577

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

GPT Training Memory Estimation - NeMo Practice | Jianbin Chang #2577

giscus[bot] bot Jul 15, 2024

GPT Training Memory Estimation - NeMo Practice | Jianbin Chang

Replies: 3 comments

AlexBodner Jul 15, 2024 — with giscus

sowmen Aug 29, 2024 — with giscus

liviclee Nov 13, 2024 — with giscus

giscus[bot]
bot Jul 15, 2024

AlexBodner
Jul 15, 2024 — with giscus

sowmen
Aug 29, 2024 — with giscus

liviclee
Nov 13, 2024 — with giscus