Back to Search
Start Over
GeneRetriever: software to extract all genes and transcripts in between two genetic markers to assist design of human custom microarrays
- Source :
- BioTechniques. 39:180-184
- Publication Year :
- 2005
- Publisher :
- Future Science Ltd, 2005.
-
Abstract
- 180 BioTechniques Vol. 39, No. 2 (2005) Identifying the genes that are responsible for human genetic disorders with complex inheritance patterns (e.g., multigenic diseases) has proven to be more difficult than anticipated. Indeed, when the mode of inheritance is unknown, classical parametric linkage studies are not relevant. Nonparametric linkage analysis can help approximate the candidate loci, but this type of analysis often leads to the identification of large intervals (10–20 cM) that may contain hundreds of genes, thus rendering the candidate gene approach rather tedious (1). Adding an expression screening may significantly reduce the number of candidate genes. Microarray expression studies of the genes located within such genetic intervals should be performed on relevant tissues (2). The design of a custom microrarray containing probes of all the genes located in the intervals of interest is required to carry out such experiments. Designing such microarrays involves gathering much additional information concerning these genes, in particular, name, transcript accession numbers, or nucleotide sequences. Collecting these data manually is time-consuming and very error-prone. GeneRetriever is a Perl-based data mining tool developed to automate, accelerate, and secure the process of locally retrieving user-chosen comprehensive information about human genes or transcripts located between two genetic markers. As annotation strategies are specific for each database, we implemented a database parameter entry that allows collection of data from either the National Center for Biotechnology Information (NCBI; www.ncbi.nlm.nih. gov) or Ensembl (www.ensembl.org) databases (3,4). Then, several options make it possible to define which data should be included in the returned gene/transcript table. These options are clustered into three parts: (i) genespecific data; (ii) transcript-specific data; and (iii) expression analysis data. Gene-specific options include database (either NCBI or Ensembl) identifier, Hugo Gene Nomenclature Committee symbol (5), gene description, DNA strand (plus or minus), type of gene (known or predicted), cytogenetic localization, summary of functional annotations, Entrez Gene identifier (6), web address of either NCBI or Ensembl gene page, and the number of transcript variants. Additional transcript information is optionally available, such as Ensembl transcript or RefSeq accession numbers. Structural data, including transcript size, number of exons, and size of the longest exon, can also be added to the query. These data may be useful when working with predicted genes; for instance, the relevance of predicted genes composed of one single exon of
- Subjects :
- Genetic Markers
Candidate gene
Information Storage and Retrieval
Biology
General Biochemistry, Genetics and Molecular Biology
User-Computer Interface
Databases, Genetic
RefSeq
Humans
Ensembl
Gene
Natural Language Processing
Oligonucleotide Array Sequence Analysis
Genetics
Genome, Human
Entrez Gene
Chromosome Mapping
Gene nomenclature
Computer-Aided Design
Database Management Systems
Human genome
DNA microarray
Algorithms
Software
Transcription Factors
Biotechnology
Subjects
Details
- ISSN :
- 19409818 and 07366205
- Volume :
- 39
- Database :
- OpenAIRE
- Journal :
- BioTechniques
- Accession number :
- edsair.doi.dedup.....a2a7b3e890e78e310c2b36b0de5ebfff