Back to Search
Start Over
Decentralized Learning: Theoretical Optimality and Practical Improvements.
- Source :
-
Journal of Machine Learning Research . 2023, Vol. 24, p1-62. 62p. - Publication Year :
- 2023
-
Abstract
- Decentralization is a promising method of scaling up parallel machine learning systems. In this paper, we provide a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. Our lower bound reveals a theoretical gap in known convergence rates of many existing decentralized training algorithms, such as D-PSGD. We prove by construction this lower bound is tight and achievable. Motivated by our insights, we further propose DeTAG, a practical gossip-style decentralized algorithm that achieves the lower bound with only a logarithm gap. While a simple version of DeTAG with plain SGD and constant step size suffice for achieving theoretical limits, we additionally provide convergence bound for DeTAG under general non-increasing step size and momentum. Empirically, we compare DeTAG with other decentralized algorithms on multiple vision benchmarks, including CIFAR10/100 and ImageNet. We substantiate our theory and show DeTAG converges faster on unshuffled data and in sparse networks. Furthermore, we study a DeTAG variant, DeTAG*, that practically speeds up data-center-scale model training. This manuscript is the extended version for (Lu and De Sa, 2021). [ABSTRACT FROM AUTHOR]
- Subjects :
- *MACHINE learning
*INSTRUCTIONAL systems
*ALGORITHMS
*LOGARITHMS
Subjects
Details
- Language :
- English
- ISSN :
- 15324435
- Volume :
- 24
- Database :
- Academic Search Index
- Journal :
- Journal of Machine Learning Research
- Publication Type :
- Academic Journal
- Accession number :
- 176355357