Journal: neurocomputing / Publication Year Range: Last 50 years / Search Limiters: Peer Reviewed / Topic: computational complexity and optimization algorithms - Searchworks@Jio Institute Digital Library Search Results

1. Hybrid-order distributed SGD: Balancing communication overhead, computational complexity, and convergence rate for distributed learning.

Author: Omidvar, Naeimeh, Hosseini, Seyed Mohammad, and Maddah-Ali, Mohammad Ali
Subjects: *OPTIMIZATION algorithms, *COMPUTATIONAL complexity, *GENERALIZATION, *PRIOR learning, *SCALABILITY
Abstract: Communication overhead, computation load, and convergence speed are three major challenges in the scalability of distributed stochastic optimization algorithms in training large neural networks. In this paper, we propose the approach of hybrid-order distributed stochastic gradient descent (HO-SGD) which strikes a better balance between these three than the previous methods, for a general class of non-convex stochastic optimization problems. In particular, we advocate that by properly interleaving zeroth-order and first-order gradient updates, it is possible to significantly reduce the communication and computation overheads while guaranteeing a fast convergence. The proposed method guarantees the same order of convergence rate as in the fastest distributed methods (i.e., fully synchronous SGD) while having significantly less computational complexity and communication overhead per iteration, and the same order of communication overhead as in the state-of-the-art communication-efficient methods, with order-wisely less computational complexity. Moreover, it order-wisely improves the convergence rate of zeroth-order SGD methods. Finally and remarkably, empirical studies demonstrate that the proposed hybrid-order approach provides significantly higher test accuracies and superior generalization than all the baselines, owing to its novel exploration mechanism. • This paper proposes the novel approach of hybrid-order optimization and learning , which strikes a better balance between communication overhead, computational complexity, and convergence rate for distributed optimization and learning than the previous methods. • The proposed method can solve a general class of non-convex stochastic optimization problems with guaranteed convergence to the stationary point. • The proposed method guarantees the same order of convergence rate (in terms of the numbers of iterations and worker nodes) as in the fastest distributed methods (i.e., fully synchronous SGD), while having significantly less computational complexity and communication overhead per iteration. • The proposed method guarantees the same order of communication overhead as in the state-of-the-art communication-efficient methods, with order-wisely less computational complexity. • The proposed method order-wisely improves the convergence rate of zeroth-order SGD methods. • Finally and remarkably, empirical studies demonstrate that the proposed hybrid-order approach provides significantly higher test accuracies and superior generalization than all the baselines, owing to its novel exploration mechanism. • The paper proposes a novel exploration mechanism resulting in better generalization. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Adaptive orthogonal gradient descent algorithm for fully complex-valued neural networks.

Author: Zhao, Weijing and Huang, He
Subjects: *OPTIMIZATION algorithms, *ALGORITHMS, *COMPUTATIONAL complexity
Abstract: For optimization algorithms of fully complex-valued neural networks, complex-valued stepsize is helpful to make the training escape from saddle points. In this paper, an adaptive orthogonal gradient descent algorithm with complex-valued stepsize is proposed for the efficient training of fully complex-valued neural networks. The basic idea is that, at each iteration, the search direction is constructed as a combination of two orthogonal gradient directions by using the algebraic representation of complex-valued stepsize. It is then shown that the determination of suitable complex-valued stepsize is facilitated by a decoupling method such that the computational complexity involved in the training process is greatly reduced. The experiments are finally conducted on pattern classification, nonlinear channel equalization and signal prediction to confirm the advantages of the proposed algorithm. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

2 results

1. Hybrid-order distributed SGD: Balancing communication overhead, computational complexity, and convergence rate for distributed learning.

2. Adaptive orthogonal gradient descent algorithm for fully complex-valued neural networks.

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

2 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources