Success with OPT-175B #1

taesiri · 2022-07-26T02:42:49Z

Hello,

Thank you for sharing this great implementation with the community.

I just wanted to open this Issue and share my success in running the OPT-175B model on a DGX station.

The model takes ~3 minutes to load and it uses ~58% of memory on the first 7 GPUs and 28% of the last one.

Please feel free to close this issue.

BenfengXu · 2022-08-30T03:37:16Z

Congratulations! May I ask the specific configuration of your DGX station? Is it 8XA100 (40GB) or (80GB)?

taesiri · 2022-08-30T10:21:15Z

@BenfengXu You need the 8x80GB variant, as the model does not fit in 8x40GB (unless you do some tricks like this).

Provide feedback