Back to Search
Start Over
A Machine Learning Approach to Reduce Dimensional Space in Large Datasets
- Source :
- RUA. Repositorio Institucional de la Universidad de Alicante, Universidad de Alicante (UA), IEEE Access, Vol 8, Pp 148181-148192 (2020)
- Publication Year :
- 2020
- Publisher :
- IEEE, 2020.
-
Abstract
- Large datasets computing is a research problem as well as a huge challenge due to massive amounts of data that are mined and crunched in order to successfully analyze these massive datasets because they constitute a valuable source of information over different and cross-folded domains, and therefore it represents an irreplaceable opportunity. Hence, the increasing number of environments that use data-intensive computations need more complex calculations than the ones applied to grid-based infrastructures. In this way, this paper analyzes the most commonly used algorithms regarding to this complex problem of handling large datasets whose part of research efforts are focused on reducing dimensional space. Consequently, we present a novel machine learning method that reduces dimensional space in large datasets. This approach is carried out by developing different phases: merging all datasets as a huge one, performing the Extract, Transform and Load (ETL) process, applying the Principal Component Analysis (PCA) algorithm to machine learning techniques, and finally displaying the data results by means of dashboards. The major contribution in this paper is the development of a novel architecture divided into five phases that presents an hybrid method of machine learning for reducing dimensional space in large datasets. In order to verify the correctness of our proposal, we have presented a case study with a complex dataset, specifically an epileptic seizure recognition database. The experiments carried out are very promising since they present very encouraging results to be applied to a great number of different domains. This work was partially funded by Grant RTI2018-094283-B-C32, ECLIPSE-UA (Spanish Ministry of Education and Science), and in part by the Lucentia AGI Grant. This work was partially funded by GENDER-NET Plus Joint Call on Gender an UN Sustainable Development Goals (European Commission - Grant Agreement 741874), funded in Spain by “La Caixa” Foundation (ID 100010434) with code LCF/PR/DE18/52010001 to MTH.
- Subjects :
- General Computer Science
Library science
Large dataset
02 engineering and technology
Space (commercial competition)
large dataset
020204 information systems
Political science
Machine learning
0202 electrical engineering, electronic engineering, information engineering
Dashboards
General Materials Science
European commission
Data mining
dimensionality reduction
Sustainable development
PCA
General Engineering
Cross-validation
data mining
Dimensionality reduction
ETL
Work (electrical)
Lenguajes y Sistemas Informáticos
020201 artificial intelligence & image processing
Christian ministry
lcsh:Electrical engineering. Electronics. Nuclear engineering
lcsh:TK1-9971
Arquitectura y Tecnología de Computadores
Subjects
Details
- ISSN :
- 20180942
- Database :
- OpenAIRE
- Journal :
- RUA. Repositorio Institucional de la Universidad de Alicante, Universidad de Alicante (UA), IEEE Access, Vol 8, Pp 148181-148192 (2020)
- Accession number :
- edsair.doi.dedup.....c182a8cd1d324fa54a3221529bdf0638