R.I.P.
π»
Ghosted
DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
November 17, 2023 Β· Entered Twilight Β· π European Conference on Computer Systems
Repo contents: CODE_OF_CONDUCT.md, CONTRIBUTING.md, LICENSE, NOTICE, README.md, THIRD-PARTY-LICENSES, docs, dynapipe, requirements.txt, scripts, setup.py, tests
Authors
Chenyu Jiang, Zhen Jia, Shuai Zheng, Yida Wang, Chuan Wu
arXiv ID
2311.10418
Category
cs.DC: Distributed Computing
Cross-listed
cs.LG
Citations
17
Venue
European Conference on Computer Systems
Repository
https://github.com/awslabs/optimizing-multitask-training-through-dynamic-pipelines
β 19
Last Checked
1 month ago
Abstract
Multi-task model training has been adopted to enable a single deep neural network model (often a large language model) to handle multiple tasks (e.g., question answering and text summarization). Multi-task training commonly receives input sequences of highly different lengths due to the diverse contexts of different tasks. Padding (to the same sequence length) or packing (short examples into long sequences of the same length) is usually adopted to prepare input samples for model training, which is nonetheless not space or computation efficient. This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training. We advocate pipeline-parallel training of the large model with variable-length micro-batches, each of which potentially comprises a different number of samples. We optimize micro-batch construction using a dynamic programming-based approach, and handle micro-batch execution time variation through dynamic pipeline and communication scheduling, enabling highly efficient pipeline training. Extensive evaluation on the FLANv2 dataset demonstrates up to 4.39x higher training throughput when training T5, and 3.25x when training GPT, as compared with packing-based baselines. DynaPipe's source code is publicly available at https://github.com/awslabs/optimizing-multitask-training-through-dynamic-pipelines.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Distributed Computing
R.I.P.
π»
Ghosted
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
R.I.P.
π»
Ghosted
Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains
R.I.P.
π»
Ghosted
Reproducing GW150914: the first observation of gravitational waves from a binary black hole merger
R.I.P.
π»
Ghosted
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
R.I.P.
π»
Ghosted