GPT Training Memory Estimation - NeMo Practice | Jianbin Chang #2577
Replies: 3 comments
-
Hi Jianbin, awesome article, haven't found one that is more detailed! Do you mind to share the code for the memory estimation? Because I implemented it and I'm getting different results so I'm probably missing something. |
Beta Was this translation helpful? Give feedback.
0 replies
-
What do tensor parallel size (t) and data parallel size (d) mean? For 1 node with 8xA6000 GPUs what would be the values? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Good job, mark. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
GPT Training Memory Estimation - NeMo Practice | Jianbin Chang
Nice blog.
https://shjwudp.github.io/blog/2023/gpt-training-memory-estimation-nemo-training-practice/
Beta Was this translation helpful? Give feedback.
All reactions