Start Over

SlimML: Removing Non-Critical Input Data in Large-Scale Iterative Machine Learning.

Authors :: Han, Rui
Liu, Chi Harold
Li, Shilin
Chen, Lydia Y.
Wang, Guoren
Tang, Jian
Ye, Jieping
Source :: IEEE Transactions on Knowledge & Data Engineering; May2021, Vol. 33 Issue 5, p2223-2236, 14p
Publication Year :: 2021
Abstract: The core of many large-scale machine learning (ML) applications, such as neural networks (NN), support vector machine (SVM), and convolutional neural network (CNN), is the training algorithm that iteratively updates model parameters by processing massive datasets. From a plethora of studies aiming at accelerating ML, being data parallelization and parameter server, the prevalent assumption is that all data points are equivalently relevant to model parameter updating. In this article, we challenge this assumption by proposing a criterion to measure a data point's effect on model parameter updating, and experimentally demonstrate that the majority of data points are non-critical in the training process. We develop a slim learning framework, termed SlimML, which trains the ML models only on the critical data and thus significantly improves training performance. To such an end, SlimML efficiently leverages a small number of aggregated data points per iteration to approximate the criticalness of original input data instances. The proposed approach can be used by changing a few lines of code in a standard stochastic gradient descent (SGD) procedure, and we demonstrate experimentally, on NN regression, SVM classification, and CNN training, that for large datasets, it accelerates model training process by an average of 3.61 times while only incurring accuracy losses of 0.37 percent. [ABSTRACT FROM AUTHOR]

Subjects :: MACHINE learning
CONVOLUTIONAL neural networks
SUPPORT vector machines
ARTIFICIAL neural networks

Details

Language :: English
ISSN :: 10414347
Volume :: 33
Issue :: 5
Database :: Complementary Index
Journal :: IEEE Transactions on Knowledge & Data Engineering
Publication Type :: Academic Journal
Accession number :: 149773609
Full Text :: https://doi.org/10.1109/TKDE.2019.2951388

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

SlimML: Removing Non-Critical Input Data in Large-Scale Iterative Machine Learning.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

SlimML: Removing Non-Critical Input Data in Large-Scale Iterative Machine Learning.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources