I'm currently an MSCS student at CMU interested in Distributed Systems, Machine Learning Systems, Computer Architecture, and Blockchain. I code in C++, Python, Go, and Rust.
This work is accepted in MLSys'21 (Fourth Conference on Machine Learning and Systems).
- Link to the publication: https://arxiv.org/abs/2102.02344
- Link to the code: https://github.com/UofT-EcoSystem/hfta
- Link to the website: https://uoft-ecosystem.github.io/hfta/
Our study reveals that single-accelerator training jobs can dominate the cluster-wide resource consumption when launched repetitively (e.g., for hyper-parameter tuning) while severely underutilizing the hardware. Therefore, we propose Horizontally Fused Training Array (HFTA), a new DL framework extension that horizontally fuses models from repetitive jobs and trains them simultaneously on a shared accelerator. HFTA demonstrates great results on increasing hardware utilizations on GPU (V100, A100, RTX6000) and TPU (v3) on multiple benchmarks (DCGAN, PointNet, ResNet, MobileNet, Transformer, BERT). It achieves up to 15.1x higher training throughput compared to the traditional practice.