1. The Genexpress IMAGE Knowledge Base of the Human Brain Transcriptome: A Prototype Integrated Resource for Functional and Computational Genomics
- Author
-
Christiane Matingou, David R. Cox, Ute Wirkner, Nobuo Nomura, Charles Decraene, Eric Eveno, Yves Vandenbrouck, Rémi Houlgatte, Fariza Tahi, Régine Mariage-Samson, Charles Auffray, Geneviève Piétu, Wilhelm Ansorge, Takahiro Nagase, Marie-Dominique Devignes, Nicole-Adeline Fayein, Genexpress, Centre National de la Recherche Scientifique (CNRS), European Molecular Biology Laboratory [Heidelberg] (EMBL), Stanford University, and Kazusa DNA Research Institute (KDRI)
- Subjects
Resource ,Positional cloning ,Databases, Factual ,[INFO.INFO-OH]Computer Science [cs]/Other [cs.OH] ,Gene Expression ,integrated mapping ,Computational biology ,computational genomics ,Biology ,Genome ,séquençage d'adnc complet ,human genome ,Genetics ,Humans ,genomique computationnelle ,RNA, Messenger ,Genetics (clinical) ,Gene Library ,genome humain ,Brain Chemistry ,Expressed sequence tag ,Internet ,Gene map ,cDNA library ,Computational genomics ,Computational Biology ,base de connaissance ,Genes ,GenBank ,knowledge base ,Human genome ,cartographie intégrée ,full-length cdna sequence ,expression des gènes - Abstract
The genomes of individuals remain virtually the same throughout their lifetimes, with the notable exceptions of specific rearrangements such as those of the immunoglobulin genes in B lymphocytes and T-cell-receptor genes in T lymphocytes, and a limited number of mutations arising as errors of DNA replication during cell division or DNA damage induced by external agents (radiation, viruses) and that remain uncorrected by the DNA proofreading systems. In contrast, each cell of the hundreds of cell types that make up the human body expresses different sets of genes at different levels as the result of differentiation, development, environmental influences, disease, or treatment. Each physiological and pathological situation can thus be characterized by a specific set of gene transcripts (transcriptome) and protein products (proteome). The term transcriptome was coined by one of us (C. Auffray) in 1996 to characterize entire sets of transcripts; it has been used by others in the context of simpler unicellular model organisms such as yeast (Velculescu et al. 1997; Dujon, 1998). Gene expression is regulated at the level of transcription to a large extent. Therefore assessment of the variation of transcriptomes provides a global initial appraisal of the dynamic aspects of these regulations. Description of the corresponding proteomes to complement the expression profiles is under way with the development of methods for large-scale systematic analyses of proteins (see, e.g., Humphery-Smith et al. 1997; VanBogelen et al. 1997; Anderson and Anderson 1998; Wilkins et al. 1998). The ultimate outcome of the Human Genome Project will be to provide biologists with the basic knowledge necessary to decipher the structure and function of all human genes and their products in relation to physiology and disease. As part of this effort, a Consortium for Integrated Molecular Analyses of Genomes and their Expression (IMAGE, Lennon et al. 1996) has established a common resource of publicly available cDNA libraries from which the genome community has collected a wealth of sequence, map, and expression data. Thus, from the circa one million human partial cDNA sequences registered in GenBank, >80% have been derived from IMAGE Consortium cDNA collections (Auffray et al. 1995; Hillier et al. 1996). Several groups have developed clustering approaches to establish links between the cDNA clones and sequences derived from transcripts of the same gene as the result of alternate initiation and termination of transcription and alternate splicing (Adams et al. 1995; Houlgatte et al. 1995; Schuler et al. 1997; Burke et al. 1998), indicating that most of the estimated 60,000–80,000 human genes (Antequera and Bird 1994; Fields et al.1994) are already represented in the IMAGE cDNA collections. The most recent versions of the human gene maps that have been assembled by an international consortium of laboratories using radiation-hybrid panels and physical resources are based mostly on expressed sequence tag site (ESTS) markers derived from these IMAGE Consortium resources, and provide localizations for some 30,000 distinct genes (Hudson et al. 1995; Gyapay et al. 1996; Schuler et al. 1996; Stewart et al. 1997; Deloukas et al. 1998). A wide variety of cytogenetic, genetic, and physical mapping data is available on a genome, chromosome, or local scale. Because the methods and resources used to build these maps vary in their basic principles and resolution power, their integration and the assessment of the precision of the position of a given gene pose both fundamental and practical problems and difficulties that remain largely unresolved. Even in regions in which extensive genomic sequences are available, this remains an important task to achieve to facilitate the identification of the genes involved in inherited diseases and physiological traits within the frame of positional cloning approaches. The scaling up of human genome sequencing that is underway makes it an even more important issue to address. Collecting expression profiles of human genes in the hundreds of cell types that make up the human body would further bridge the gap between the structure of the genome and biology. The basic methods for cDNA array hybridization have been available since the inception of the genetic engineering revolution >20 years ago. Advances in robotics, imaging technologies, and informatics that are the hallmark of the genome era made it possible to collect expression profiles by semiquantitative hybridization on thousands of cDNA targets spotted at high density on membranes (Nguyen et al. 1995; Takahashi et al. 1995; Zhao et al. 1995; Pietu et al. 1996). However, it is only recently that, with the emergence of novel platforms using microarray formats of increased densities and fluorescent probes, the potential of this approach has received more attention and acceptance in the community (Schena et al. 1995, 1996; De Risi et al. 1996, 1997; Lockhart et al. 1996; Lashkari et al. 1997; Zhang et al. 1997; Wodicka et al. 1997; Cho et al. 1998; de Saizeu et al. 1998; Gray et al.1998) with other applications such as genotyping (Winzeler et al. 1998). Whether based on arrays of cDNA clones or arrays of oligonucleotides synthesized in situ on glass, these techniques take advantage of the physical and/or information resources of the IMAGE Consortium. In this context, the Genexpress team developed the IMAGE concept (Auffray et al. 1995) and cofounded the IMAGE Consortium (Lennon et al. 1996). After the initial characterization of one-third of the first IMAGE Consortium cDNA library from infant brain (Soares et al. 1994), we developed the Genexpress Index (Houlgatte et al. 1995), a resource for gene discovery and the genic map of the human genome based on clustering of annotated cDNA clones, sequences, and ESTS genic markers assigned to specific human chromosomes. We also established a method for semiquantitative analysis of the expression levels of thousands of gene transcripts and identified by differential hybridization a set of novel genes transcribed preferentially in human muscles (Pietu et al. 1996). Here we further characterize 5058 human genes represented in the infant brain library, and describe a novel, prototype resource for functional and computational genomics of the human brain transcriptome, the Genexpress IMAGE Knowledge Base, integrating curated sequence, map, and expression annotations.
- Published
- 1999