1. tuple_plot: fast pairwise nucleotide sequence comparison with noise suppression
- Author
-
Niels Jahn, Matthias Platzer, and Karol Szafranski
- Subjects
Statistics and Probability ,Theoretical computer science ,Computer science ,Molecular Sequence Data ,Sequence alignment ,Biochemistry ,Genome ,Plot (graphics) ,Sequence Homology, Nucleic Acid ,Nucleotide ,Molecular Biology ,Time complexity ,chemistry.chemical_classification ,Sequence ,Stochastic Processes ,Models, Statistical ,Base Sequence ,Models, Genetic ,Nucleotides ,Nucleic acid sequence ,Chromosome Mapping ,Sequence Analysis, DNA ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,chemistry ,Pairwise comparison ,Artifacts ,Algorithm ,Sequence Alignment ,Word (computer architecture) ,Algorithms ,Software - Abstract
Summary: The program tuple_plot identifies and visualizes local similarities between two genomic sequences, typically 100 kb or longer, by applying the well-known dotplot principle. A dictionary of sequence words built from the input sequences serves to construct a task-specific expectancy model that is used to attribute significance values to pairwise word hits. The dictionary-based approach allows fast computation, the computation time scaling to O(N log N), depending on the size of the input sequences. The proposed scoring scheme appreciably increases the signal-to-noise ratio and may help to improve other word-based sequence comparison approaches. Availability: tuple_plot is available at and may be used under GNU public license. Contact: szafrans@fli-leibniz.de
- Published
- 2006