Back to Search
Start Over
CHTKC: a robust and efficient k-mer counting algorithm based on a lock-free chaining hash table
- Source :
- Briefings in Bioinformatics. 22
- Publication Year :
- 2020
- Publisher :
- Oxford University Press (OUP), 2020.
-
Abstract
- Motivation: Calculating the frequency of occurrence of each substring of length k in DNA sequences is a common task in many bioinformatics applications, including genome assembly, error correction, and sequence alignment. Although the problem is simple, efficient counting of datasets with high sequencing depth or large genome size is a challenge. Results: We propose a robust and efficient method, CHTKC, to solve the k-mer counting problem with a lock-free hash table that uses linked lists to resolve collisions. We also design new mechanisms to optimize memory usage and handle situations where memory is not enough to accommodate all k-mers. CHTKC has been thoroughly tested on seven datasets under multiple memory usage scenarios and compared with Jellyfish2 and KMC3. Our work shows that using a hash-table-based method to effectively solve the k-mer counting problem remains a feasible solution.
- Subjects :
- 0303 health sciences
Genome
Computer science
030302 biochemistry & molecular biology
Computational Biology
DNA
Linked list
Hash table
Substring
03 medical and health sciences
Counting problem
k-mer
Chaining
Non-blocking algorithm
Humans
Error detection and correction
Molecular Biology
Algorithm
Algorithms
030304 developmental biology
Information Systems
Subjects
Details
- ISSN :
- 14774054 and 14675463
- Volume :
- 22
- Database :
- OpenAIRE
- Journal :
- Briefings in Bioinformatics
- Accession number :
- edsair.doi.dedup.....e18e57d3b6bcfd7ce2681abb180f0fda