inter_pipeline_hetero_batch.txt

Jang, Insu, et al. "Oobleck: Resilient distributed training of large models using pipeline templates." Proceedings of the 29th Symposium on Operating Systems Principles. 2023.;https://dl.acm.org/doi/abs/10.1145/3600006.3613152?casa_token=M6mZZ0le2V0AAAAA:oafed3aK4DXHHTsuywdjGpCagEw-DU2KczQ7hnirDT6CT8h_q0foSgAq18UQKIILKCQ8sUUzDaKq;14;Although this paper is not targeting heterogeneous cluster, section 4.2.2 provides a formulation of static optimal batch distribution for heterogeneous pipelines with homogeneous devices(eg. pipeline A has 3 nodes, pipeline B has 4 nodes)
Jia, Xianyan, et al. "Whale: Efficient giant model training over heterogeneous {GPUs}." 2022 USENIX Annual Technical Conference (USENIX ATC 22). 2022.;https://www.usenix.org/conference/atc22/presentation/jia-xianyan;33; See section 3.31 for dynamic workload(mini-batch) shifting
Li, Dacheng, et al. "Amp: Automatically finding model parallel strategies with heterogeneity awareness." Advances in Neural Information Processing Systems 35 (2022): 6630-6639.;https://proceedings.neurips.cc/paper_files/paper/2022/file/2b4bfa1cebe78d125fefd7ea6ffcfc6d-Paper-Conference.pdf;7; See section 3.6 on statically enumerating mini-batch for each pipeline