1. More robust detection of motifs in coexpressed genes by using phylogenetic information
- Author
-
Abeer A. Fadda, Gert Thijs, Pieter Monsieurs, Jozef Vanderleyden, Kathleen Marchal, Bart De Moor, and Sigrid C. J. De Keersmaecker
- Subjects
DNA, Bacterial ,COREGULATED GENES ,ESCHERICHIA-COLI K-12 ,Yersinia pestis ,Molecular Sequence Data ,DNA Footprinting ,FACTOR-BINDING SITES ,Computational biology ,Regulatory Sequences, Nucleic Acid ,Phylogenetic footprinting ,Biology ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,Genome ,PMRA-REGULATED GENES ,MICROARRAY ,Structural Biology ,Consensus Sequence ,Consensus sequence ,Cluster Analysis ,COMPUTATIONAL IDENTIFICATION ,lcsh:QH301-705.5 ,Molecular Biology ,Phylogeny ,Oligonucleotide Array Sequence Analysis ,Genetics ,Biological data ,Base Sequence ,FUNCTIONAL-ANALYSIS ,Phylogenetic tree ,Gene Expression Profiling ,Methodology Article ,Applied Mathematics ,Reproducibility of Results ,Biology and Life Sciences ,ENTERICA SEROVAR TYPHIMURIUM ,Gene Expression Regulation, Bacterial ,Sequence Analysis, DNA ,Computer Science Applications ,lcsh:Biology (General) ,Regulatory sequence ,DISCOVERY ,lcsh:R858-859.7 ,DNA microarray ,Sequence motif ,Algorithms ,RESISTANCE - Abstract
Background Several motif detection algorithms have been developed to discover overrepresented motifs in sets of coexpressed genes. However, in a noisy gene list, the number of genes containing the motif versus the number lacking the motif might not be sufficiently high to allow detection by classical motif detection tools. To still recover motifs which are not significantly enriched but still present, we developed a procedure in which we use phylogenetic footprinting to first delineate all potential motifs in each gene. Then we mutually compare all detected motifs and identify the ones that are shared by at least a few genes in the data set as potential candidates. Results We applied our methodology to a compiled test data set containing known regulatory motifs and to two biological data sets derived from genome wide expression studies. By executing four consecutive steps of 1) identifying conserved regions in orthologous intergenic regions, 2) aligning these conserved regions, 3) clustering the conserved regions containing similar regulatory regions followed by extraction of the regulatory motifs and 4) screening the input intergenic sequences with detected regulatory motif models, our methodology proves to be a powerful tool for detecting regulatory motifs when a low signal to noise ratio is present in the input data set. Comparing our results with two other motif detection algorithms points out the robustness of our algorithm. Conclusion We developed an approach that can reliably identify multiple regulatory motifs lacking a high degree of overrepresentation in a set of coexpressed genes (motifs belonging to sparsely connected hubs in the regulatory network) by exploiting the advantages of using both coexpression and phylogenetic information.
- Published
- 2006