Back to Search Start Over

GeneRetriever: software to extract all genes and transcripts in between two genetic markers to assist design of human custom microarrays

Authors :
Claude Besmond
Yehuda Brody
Arnold Munnich
Stanislas Lyonnet
Mathieu Clément-Ziza
Source :
BioTechniques. 39:180-184
Publication Year :
2005
Publisher :
Future Science Ltd, 2005.

Abstract

180 BioTechniques Vol. 39, No. 2 (2005) Identifying the genes that are responsible for human genetic disorders with complex inheritance patterns (e.g., multigenic diseases) has proven to be more difficult than anticipated. Indeed, when the mode of inheritance is unknown, classical parametric linkage studies are not relevant. Nonparametric linkage analysis can help approximate the candidate loci, but this type of analysis often leads to the identification of large intervals (10–20 cM) that may contain hundreds of genes, thus rendering the candidate gene approach rather tedious (1). Adding an expression screening may significantly reduce the number of candidate genes. Microarray expression studies of the genes located within such genetic intervals should be performed on relevant tissues (2). The design of a custom microrarray containing probes of all the genes located in the intervals of interest is required to carry out such experiments. Designing such microarrays involves gathering much additional information concerning these genes, in particular, name, transcript accession numbers, or nucleotide sequences. Collecting these data manually is time-consuming and very error-prone. GeneRetriever is a Perl-based data mining tool developed to automate, accelerate, and secure the process of locally retrieving user-chosen comprehensive information about human genes or transcripts located between two genetic markers. As annotation strategies are specific for each database, we implemented a database parameter entry that allows collection of data from either the National Center for Biotechnology Information (NCBI; www.ncbi.nlm.nih. gov) or Ensembl (www.ensembl.org) databases (3,4). Then, several options make it possible to define which data should be included in the returned gene/transcript table. These options are clustered into three parts: (i) genespecific data; (ii) transcript-specific data; and (iii) expression analysis data. Gene-specific options include database (either NCBI or Ensembl) identifier, Hugo Gene Nomenclature Committee symbol (5), gene description, DNA strand (plus or minus), type of gene (known or predicted), cytogenetic localization, summary of functional annotations, Entrez Gene identifier (6), web address of either NCBI or Ensembl gene page, and the number of transcript variants. Additional transcript information is optionally available, such as Ensembl transcript or RefSeq accession numbers. Structural data, including transcript size, number of exons, and size of the longest exon, can also be added to the query. These data may be useful when working with predicted genes; for instance, the relevance of predicted genes composed of one single exon of

Details

ISSN :
19409818 and 07366205
Volume :
39
Database :
OpenAIRE
Journal :
BioTechniques
Accession number :
edsair.doi.dedup.....a2a7b3e890e78e310c2b36b0de5ebfff