DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators #4833

nelyahu · 2023-12-18T08:58:33Z

The approach till today use the practice where the torch.nn.parameter data is being replaced with a new cpu data storage, to offload device memory.
All params are being flatenned on the host and moved to the device.
in some accelerators torch.nn.parameter which is a device parameter cannot be assigned with a cpu storage.
This PR copy the param data into a new cpu tensor, and shrinks the device storage.
Later when the flat buffer is moved to the device param.data will be a view to the flat buffer.

Today DeepSpeedZeroOptimizer flatten the FP16 weight (which are on the device) by moving the param.data to cpu, while maintaining the same param object. This practice cannot work with all device types. This commit introduces a new approach for doing it without data sharing. 1. copy each param.data to a new CPU tensor 2. keep param onject on device, and shrink the storage. 3. flatten the CPU storages to a host flat tensor 4. move to device 5. resize device params back to their original shape 6. point to offset in the flat buffer.

deepspeed/runtime/zero/stage_1_and_2.py

nelyahu · 2024-01-10T20:47:18Z

Hi @tjruwase - i fixed a failing UT issue, can you please re-trigger workflow?

…elerators (microsoft#4833) The approach till today use the practice where the torch.nn.parameter data is being replaced with a new cpu data storage, to offload device memory. All params are being flatenned on the host and moved to the device. in some accelerators torch.nn.parameter which is a device parameter cannot be assigned with a cpu storage. This PR copy the param data into a new cpu tensor, and shrinks the device storage. Later when the flat buffer is moved to the device param.data will be a view to the flat buffer. --------- Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>

…more accelerators (microsoft#4833)" This reverts commit ade9836.

nelyahu and others added 2 commits December 12, 2023 13:40

Merge branch 'microsoft:master' into zeroOptParamsFlatenning

324e33f

nelyahu requested review from tjruwase and mrwyattii as code owners December 18, 2023 08:58

nelyahu changed the title ~~DeepSpeedZeroOptimizer: refactor bit16 flatenning to support more accelerators~~ DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators Dec 18, 2023