Back to Search Start Over

Multiple errors correction for position-limited DNA sequences with GC balance and no homopolymer for DNA-based data storage.

Authors :
Li, Xiayang
Chen, Moxuan
Wu, Huaming
Source :
Briefings in Bioinformatics. Jan2023, Vol. 24 Issue 1, p1-11. 11p.
Publication Year :
2023

Abstract

Deoxyribonucleic acid (DNA) is an attractive medium for long-term digital data storage due to its extremely high storage density, low maintenance cost and longevity. However, during the process of synthesis, amplification and sequencing of DNA sequences with homopolymers of large run-length, three different types of errors, namely, insertion, deletion and substitution errors frequently occur. Meanwhile, DNA sequences with large imbalances between GC and AT content exhibit high dropout rates and are prone to errors. These limitations severely hinder the widespread use of DNA-based data storage. In order to reduce and correct these errors in DNA storage, this paper proposes a novel coding schema called DNA-LC , which converts binary sequences into DNA base sequences that satisfy both the GC balance and run-length constraints. Furthermore, our coding mode is able to detect and correct multiple errors with a higher error correction capability than the other methods targeting single error correction within a single strand. The decoding algorithm has been implemented in practice. Simulation results indicate that our proposed coding scheme can offer outstanding error protection to DNA sequences. The source code is freely accessible at https://github.com/XiayangLi2301/DNA. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14675463
Volume :
24
Issue :
1
Database :
Academic Search Index
Journal :
Briefings in Bioinformatics
Publication Type :
Academic Journal
Accession number :
161419747
Full Text :
https://doi.org/10.1093/bib/bbac484