1. The Weighted Factors Automaton : A Tool for DNA Sequences Analysis
- Author
-
Danielle Quichaud, Farida Benmakrouha, Christiane Hespel, Benmakrouha, Farida, Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), and Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Theoretical computer science ,Computer science ,Powerset construction ,Levenshtein automaton ,Bioinformatics ,Algorithms on Strings ,Büchi automaton ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Automaton ,Nondeterministic finite automaton with ε-moves ,DNA Sequences Biasis ,010201 computation theory & mathematics ,Deterministic automaton ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,Probabilistic automaton ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Nondeterministic finite automaton ,DNA Sequences Analysis ,Algorithm ,Weighted automata - Abstract
International audience; A lot of computing tools are often used for analyzing DNA se- quences like trees, automata, dictionaries, every one being re- served for a particular problem. A. Blumer and al. have proposed a more general computing tool : the smaller automaton recogniz- ing the subwords of a text (DAWG). In this paper we propose the concept of "weighted factors au- tomaton" producing every occurrence of any factor. Its transi- tions are labeled by the read letter and weighted by the set of the indices of the factors beginnings. The factors are obtained by concatenating the read letters and the indices of the factors begin- nings are obtained by computing the intersection of the weight- ing sets, when advancing from the initial state to a final state. We think that this automaton can be more easily processed than DAWG and we present a comparison between DAWG and our automaton: the set of the factors beginnings indices and the fac- tors frequency are more easily obtained by our automaton and the restriction of our automaton to the factors of length k maintains the automaton structure, when DAWG cannot be eas- ily restricted. The applications are numerous: By selecting factors of length 1 , we obtain the coding regions, factors of length 3 , we obtain the expression level of some gene. The "weighted factors au- tomaton" allows us to find matches of pattern, to study homol- ogy, FASTA and BLAST algorithms being significantly simplified.
- Published
- 2013