1. Toucan: deciphering the cis-regulatory logic of coregulated genes
- Author
-
Stein Aerts, Bart De Moor, Gert Thijs, Yves Moreau, Bert Coessens, and Mik Staes
- Subjects
dna-sequences ,Cell Cycle Proteins ,Biology ,Genome ,Intergenic region ,transcription factors ,expression ,Genetics ,regions ,Humans ,Promoter Regions, Genetic ,Gene ,model ,Binding Sites ,genomic dna ,SISTA ,Genome, Human ,Muscles ,Research Support, Non-U.S. Gov't ,binding-sites ,Computational Biology ,Articles ,E2F Transcription Factors ,DNA binding site ,DNA-Binding Proteins ,elements ,Gene Expression Regulation ,Liver ,Regulatory sequence ,computational analysis ,identification ,Human genome ,Promoter Regions (Genetics) ,Algorithms ,Software ,Transcription Factors ,Reference genome - Abstract
TOUCAN is a Java application for the rapid discovery of significant cis-regulatory elements from sets of coexpressed or coregulated genes. Biologists can automatically (i) retrieve genes and intergenic regions, (ii) identify putative regulatory regions, (iii) score sequences for known transcription factor binding sites, (iv) identify candidate motifs for unknown binding sites, and (v) detect those statistically over-represented sites that are characteristic for a gene set. Genes or intergenic regions are retrieved from Ensembl or EMBL, together with orthologs and supporting information. Orthologs are aligned and syntenic regions are selected as candidate regulatory regions. Putative sites for known transcription factors are detected using our MotifScanner, which scores position weight matrices using a probabilistic model. New motifs are detected using our MotifSampler based on Gibbs sampling. Binding sites characteristic for a gene set-and thus statistically over-represented with respect to a reference sequence set-are found using a binomial test. We have validated Toucan by analyzing muscle-specific genes, liver-specific genes and E2F target genes; we have easily detected many known binding sites within intergenic DNA and identified new biologically plausible sites for known and unknown transcription factors. Software available at http://www.esat.kuleuven.ac. be/similar todna/BioI/Software.html. ispartof: Nucleic acids research vol:31 issue:6 pages:1753-1764 ispartof: location:England status: published
- Published
- 2003