Back to Search
Start Over
Feature selection in high-dimensional dataset using MapReduce
- Publication Year :
- 2017
-
Abstract
- This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features.
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.1709.02327
- Document Type :
- Working Paper