Better design pattern for data_weight synchronization #83

hanbinhu · 2021-03-26T07:53:16Z

The ready event in neighbor_allreduce dst_weight makes sure the data_weight computation is done before communication, as Pytorch CUDA stream is not synchronized with our CUDA stream.

hanbinhu mentioned this issue Mar 26, 2021

Improve neighbor allreduce #78

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better design pattern for data_weight synchronization #83

Better design pattern for data_weight synchronization #83

hanbinhu commented Mar 26, 2021 •

edited

Loading

Better design pattern for data_weight synchronization #83

Better design pattern for data_weight synchronization #83

Comments

hanbinhu commented Mar 26, 2021 • edited Loading

hanbinhu commented Mar 26, 2021 •

edited

Loading