1. Sequence-Subset Distance and Coding for Error Control in DNA-Based Data Storage.
- Author
-
Song, Wentu, Cai, Kui, and Schouhamer Immink, Kees A.
- Subjects
- *
DATA warehousing , *HAMMING distance , *ERROR-correcting codes , *COMMUNICATION models , *DISTANCES , *DNA , *ERROR correction (Information theory) - Abstract
The process of DNA-based data storage (DNA storage for short) can be mathematically modelled as a communication channel, termed DNA storage channel, whose inputs and outputs are sets of unordered sequences. To design error correcting codes for DNA storage channel, a new metric, termed the sequence-subset distance, is introduced, which generalizes the Hamming distance to a distance function defined between any two sets of unordered vectors and helps to establish a uniform framework to design error correcting codes for DNA storage channel. We further introduce a family of error correcting codes, referred to as sequence-subset codes, for DNA storage and show that the error-correcting ability of such codes is completely determined by their minimum distance. We derive some upper bounds on the size of the sequence-subset codes including a tight bound for a special case, a Singleton-like bound and a Plotkin-like bound. We also propose some constructions, including an optimal construction for that special case, which imply lower bounds on the size of such codes. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF