1. GenomicsBench: A Benchmark Suite for Genomics
- Author
-
Somnath Paul, Arun Subramaniyan, Sanchit Misra, Tim Dunn, Vasimuddin, David Blaauw, Satish Narayanasamy, Yufeng Gu, and Reetuparna Das
- Subjects
0303 health sciences ,Source code ,Exploit ,Computer science ,Data parallelism ,business.industry ,media_common.quotation_subject ,Suite ,Genomics ,Parallel computing ,Microarchitecture ,03 medical and health sciences ,0302 clinical medicine ,Software ,Benchmark (computing) ,business ,030217 neurology & neurosurgery ,030304 developmental biology ,media_common - Abstract
Over the last decade, advances in high-throughput sequencing and the availability of portable sequencers have enabled fast and cheap access to genetic data. For a given sample, sequencers typically output fragments of the DNA in the sample. Depending on the sequencing technology, the fragments range from a length of 150–250 at high accuracy to lengths in few tens of thousands but at much lower accuracy. Sequencing data is now being produced at a rate that far outpaces Moore's law and poses significant computational challenges on commodity hardware. To meet this demand, software tools have been extensively redesigned and new algorithms and custom hardware have been developed to deal with the diversity in sequencing data. However, a standard set of benchmarks that captures the diverse behaviors of these recent algorithms and can facilitate future architectural exploration is lacking. To that end, we present the GenomicsBench benchmark suite which contains 12 computationally intensive data-parallel kernels drawn from popular bioinformatics software tools. It covers the major steps in short and long-read genome sequence analysis pipelines such as basecalling, sequence mapping, de-novo assembly, variant calling and polishing. We observe that while these genomics kernels have abundant data level parallelism, it is often hard to exploit on commodity processors because of input-dependent irregularities. We also perform a detailed microarchitectural characterization of these kernels and identify their bottlenecks. GenomicsBench includes parallel versions of the source code with CPU and GPU implementations as applicable along with representative input datasets of two sizes - small and large.
- Published
- 2021
- Full Text
- View/download PDF