torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models
April 21, 2020 Β· Entered Twilight Β· π arXiv.org
"Last commit was 5.0 years ago (β₯5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: .github, .gitignore, .readthedocs.yml, .travis.yml, CONTRIBUTING.md, LICENSE, NOTICE, README.ko.md, README.md, benchmarks, docs, setup.cfg, setup.py, stubs, tests, torchgpipe, torchgpipe_balancing.py
Authors
Chiheon Kim, Heungsub Lee, Myungryong Jeong, Woonhyuk Baek, Boogeon Yoon, Ildoo Kim, Sungbin Lim, Sungwoong Kim
arXiv ID
2004.09910
Category
cs.DC: Distributed Computing
Cross-listed
cs.LG
Citations
60
Venue
arXiv.org
Repository
https://github.com/kakaobrain/torchgpipe
β 864
Last Checked
1 month ago
Abstract
We design and implement a ready-to-use library in PyTorch for performing micro-batch pipeline parallelism with checkpointing proposed by GPipe (Huang et al., 2019). In particular, we develop a set of design components to enable pipeline-parallel gradient computation in PyTorch's define-by-run and eager execution environment. We show that each component is necessary to fully benefit from pipeline parallelism in such environment, and demonstrate the efficiency of the library by applying it to various network architectures including AmoebaNet-D and U-Net. Our library is available at https://github.com/kakaobrain/torchgpipe .
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Distributed Computing
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
R.I.P.
π»
Ghosted
Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains
R.I.P.
π»
Ghosted
Reproducing GW150914: the first observation of gravitational waves from a binary black hole merger
R.I.P.
π»
Ghosted
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
R.I.P.
π»
Ghosted