Author: "Qizhen Weng" / Topic: load balancing (computing) - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Qizhen Weng"' showing total 3 results

Start Over Author "Qizhen Weng" Topic load balancing (computing)

3 results on '"Qizhen Weng"'

1. Accelerating Distributed Learning in Non-Dedicated Environments

Author: Baochun Li, Bo Li, Qizhen Weng, Chen Chen, and Wei Wang
Subjects: Computer Networks and Communications, Computer science, business.industry, Distributed computing, Cloud computing, Sample (statistics), Python (programming language), Load balancing (computing), Computer Science Applications, Load management, Hardware and Architecture, Software deployment, Synchronization (computer science), Key (cryptography), business, computer, Software, Information Systems, computer.programming_language
Abstract: Machine learning (ML) models are increasingly trained with distributed workers possessing heterogeneous resources. In such scenarios, model training efficiency may be negatively affected by \emph{stragglers}---workers that run much slower than others. Efficient model training requires eliminating such stragglers, yet for modern ML workloads, existing load balancing strategies are inefficient and even infeasible. In this paper, we propose a novel strategy, called \emph{semi-dynamic load balancing}, to eliminate stragglers of distributed ML workloads. The key insight is that ML workers shall be load-balanced at \emph{iteration boundaries}, being non-intrusive to intra-iteration execution. Based on it we further develop LB-BSP, an integrated worker coordination mechanism that adapts workers' load to their instantaneous processing capabilities---by right-sizing the sample batches at the synchronization barriers. We have designed distinct load tuning algorithms for ML in CPU clusters, in GPU clusters as well as in federated learning setups, based on their respective characteristics. LB-BSP has been implemented as a Python module for ML frameworks like TensorFlow and PyTorch. Our EC2 deployment confirms that LB-BSP is practical, effective and light-weight, and is able to accelerating distributed training by up to $54\%$ .
Published: 2023
Full Text: View/download PDF

2. Fast Distributed Deep Learning via Worker-adaptive Batch Sizing

Author: Wei Wang, Chen Chen, Baochun Li, Bo Li, and Qizhen Weng
Subjects: 020203 distributed computing, Nonlinear autoregressive exogenous model, business.industry, Computer science, Deep learning, Distributed computing, 02 engineering and technology, Load balancing (computing), Sizing, 020202 computer hardware & architecture, Recurrent neural network, 0202 electrical engineering, electronic engineering, information engineering, Distributed learning, Artificial intelligence, Training load, business
Abstract: In heterogeneous or shared clusters, distributed learning processes are slowed down by straggling workers. In this work, we propose LB-BSP, a new synchronization scheme that eliminates stragglers by adapting each worker's training load (batch size) to its processing capability. For training in shared production clusters, a prerequisite for deciding the workers' batch sizes is to know their processing speeds before each iteration starts. To this end, we adopt NARX, an extended recurrent neural network that accounts for both the historical speeds and the driving factors such as CPU and memory in prediction.
Published: 2018
Full Text: View/download PDF

3. Semi-Dynamic Load Balancing: Efficient Distributed Learning in Non-Dedicated Environments

Author: Bo Li, Qizhen Weng, Wei Wang, Chen Chen, and Baochun Li
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Performance, Computer Science - Artificial Intelligence, Computer science, Distributed computing, Dynamic load balancing, 02 engineering and technology, Load balancing (computing), Machine Learning (cs.LG), Performance (cs.PF), Artificial Intelligence (cs.AI), Computer Science - Distributed, Parallel, and Cluster Computing, Software deployment, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Distributed learning, Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Machine learning (ML) models are increasingly trained in clusters with non-dedicated workers possessing heterogeneous resources. In such scenarios, model training efficiency can be negatively affected by stragglers -- workers that run much slower than others. Efficient model training requires eliminating such stragglers, yet for modern ML workloads, existing load balancing strategies are inefficient and even infeasible. In this paper, we propose a novel strategy called semi-dynamic load balancing to eliminate stragglers of distributed ML workloads. The key insight is that ML workers shall be load-balanced at iteration boundaries, being non-intrusive to intra-iteration execution. We develop LB-BSP based on such an insight, which is an integrated worker coordination mechanism that adapts workers' load to their instantaneous processing capabilities by right-sizing the sample batches at the synchronization barriers. We have custom-designed the batch sizing algorithm respectively for CPU and GPU clusters based on their own characteristics. LB-BSP has been implemented as a Python module for ML frameworks like TensorFlow and PyTorch. Our EC2 deployment confirms that LB-BSP is practical, effective and light-weight, and is able to accelerating distributed training by up to $54\%$.
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

3 results on '"Qizhen Weng"'

1. Accelerating Distributed Learning in Non-Dedicated Environments

2. Fast Distributed Deep Learning via Worker-adaptive Batch Sizing

3. Semi-Dynamic Load Balancing: Efficient Distributed Learning in Non-Dedicated Environments

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

3 results on '"Qizhen Weng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources