Back to Search Start Over

Toward a new approach for sorting extremely large data files in the big data era.

Authors :
Shatnawi, Ali
AlZahouri, Yathrip
Shehab, Mohammed A.
Jararweh, Yaser
Al-Ayyoub, Mahmoud
Source :
Cluster Computing. Sep2019, Vol. 22 Issue 3, p819-828. 10p.
Publication Year :
2019

Abstract

The extensive amount of data and contents generated today will require a paradigm shift in processing and management techniques for these data. One of the important data processing operations is the data sorting. Using multiple passes in external merge sort has a great influence on speeding up the sorting of extremely large data files. Since in large files, the swapping time is dominant in many applications, algorithms that minimize the swapping operations are normally superior to those which only focus on CPU time optimizations. In sorting extremely large files, external algorithms, such as the merge sort, are normally used. It is shown that using multiple passes over the data set, as proposed in our algorithm, has resulted in a great improvement in the number of swaps, thus, reducing the overall sorting time. Moreover, the proposed technique is suitable to be used with the emerging parallelization techniques such as GPUs. The reported results show the superiority of the proposed technique for "CPU only" and hybrid CPU–GPU implementations. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13867857
Volume :
22
Issue :
3
Database :
Academic Search Index
Journal :
Cluster Computing
Publication Type :
Academic Journal
Accession number :
138109759
Full Text :
https://doi.org/10.1007/s10586-018-2860-1