Back to Search
Start Over
Toward a new approach for sorting extremely large data files in the big data era
- Source :
- Cluster Computing. 22:819-828
- Publication Year :
- 2018
- Publisher :
- Springer Science and Business Media LLC, 2018.
-
Abstract
- The extensive amount of data and contents generated today will require a paradigm shift in processing and management techniques for these data. One of the important data processing operations is the data sorting. Using multiple passes in external merge sort has a great influence on speeding up the sorting of extremely large data files. Since in large files, the swapping time is dominant in many applications, algorithms that minimize the swapping operations are normally superior to those which only focus on CPU time optimizations. In sorting extremely large files, external algorithms, such as the merge sort, are normally used. It is shown that using multiple passes over the data set, as proposed in our algorithm, has resulted in a great improvement in the number of swaps, thus, reducing the overall sorting time. Moreover, the proposed technique is suitable to be used with the emerging parallelization techniques such as GPUs. The reported results show the superiority of the proposed technique for “CPU only” and hybrid CPU–GPU implementations.
- Subjects :
- Focus (computing)
Computer Networks and Communications
Computer science
business.industry
Big data
Sorting
CPU time
020206 networking & telecommunications
02 engineering and technology
Parallel computing
Data sorting
Data set
Data file
Data_FILES
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Merge sort
business
Swap (computer programming)
Software
Subjects
Details
- ISSN :
- 15737543 and 13867857
- Volume :
- 22
- Database :
- OpenAIRE
- Journal :
- Cluster Computing
- Accession number :
- edsair.doi...........d1e7c2f1f41fe915f0082f51d7df9c70
- Full Text :
- https://doi.org/10.1007/s10586-018-2860-1