Block size, parallelism and predictive performance: finding the sweet spot in distributed learning.

Authors :: Oliveira, Filipe
Carneiro, Davide
Guimarães, Miguel
Oliveira, Óscar
Novais, Paulo
Source :: International Journal of Parallel, Emergent & Distributed Systems. May2024, Vol. 39 Issue 3, p379-398. 20p.
Publication Year :: 2024
Abstract: As distributed and multi-organization Machine Learning emerges, new challenges must be solved, such as diverse and low-quality data or real-time delivery. In this paper, we use a distributed learning environment to analyze the relationship between block size, parallelism, and predictor quality. Specifically, the goal is to find the optimum block size and the best heuristic to create distributed Ensembles. We evaluated three different heuristics and five block sizes on four publicly available datasets. Results show that using fewer but better base models matches or outperforms a standard Random Forest, and that 32 MB is the best block size. [ABSTRACT FROM AUTHOR]

Subjects :: *MACHINE learning
*RANDOM forest algorithms
*CLASSROOM environment

Language :: English
ISSN :: 17445760
Volume :: 39
Issue :: 3
Database :: Academic Search Index
Journal :: International Journal of Parallel, Emergent & Distributed Systems
Publication Type :: Academic Journal
Accession number :: 176614513
Full Text :: https://doi.org/10.1080/17445760.2023.2225854

Full Text Access

Tools