Descriptor: "Genome browser" / Publisher: springer science and business media llc - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Genome browser"' showing total 58 results

Start Over Descriptor "Genome browser" Publisher springer science and business media llc

58 results on '"Genome browser"'

1. Mass spectrometry-based identification and characterization of human hypothetical proteins highlighting the inconsistency across the protein databases

Author: Johny Ijaq, Medicharla V. Jagannadham, and Neeraja Bethi
Subjects: Gel electrophoresis, 0303 health sciences, Hypothetical protein, Genome browser, Computational biology, Biology, Proteomics, 03 medical and health sciences, 0302 clinical medicine, Human proteome project, Ensembl, UniProt, 030217 neurology & neurosurgery, 030304 developmental biology, Reference genome
Abstract: A myriad of predicted proteins have been described at the genome scale, but their existence has not been confirmed at the protein level. These proteins that are predicted to be expressed from an open-reading frame (ORF) but for which translation has not been demonstrated are known as hypothetical proteins and constitute major fraction of the human proteome. In this study, we aim to identify and characterize hypothetical proteins from human tumor cell lines, viz., HeLa, MCF7, and BT474, thus providing the analytical basis for their expression. We used gel electrophoresis followed by in-gel digestion of the selected protein lanes and subsequent LC–MS/MS analysis of protein tryptic digests. ENSEMBL genome browser was used for genomic alignment. On search against human hypothetical protein data from NCBI database, 110 common proteins were identified across the three selected cells lines. Out of these, 88 proteins were already functionally characterized and remaining 22 were still found to be unreviewed in UniProt, lacking the evidence of expression at the protein level. To explore them further, following HPP guidelines, 15 proteins were selected and aligned against human reference genome. Five hypothetical proteins were confirmed as isoforms of known proteins. We conclude that the proteomic approach used would serve as a suitable tool to validate the existence of predicted or hypothetical proteins at the protein level. The MS proteomics data have been deposited to the ProteomeXchange Consortium via PRIDE with the data set identifiers PXD014258.
Published: 2020

2. Expression analysis of LTR-derived miR-1269a and target gene, KSR2 in Sebastes schlegelii

Author: Jennifer Im, Hee-Eun Lee, Ahran Kim, Do-Hyung Kim, Woo Ryung Kim, Heui-Soo Kim, Hee-Jae Cha, Yung Hyun Choi, and Suhkmann Kim
Subjects: Fish Proteins, 0106 biological sciences, 0301 basic medicine, Transposable element, Retroelements, ved/biology.organism_classification_rank.species, Sequence Homology, Retrotransposon, Genome browser, Protein Serine-Threonine Kinases, Biology, 01 natural sciences, Biochemistry, 03 medical and health sciences, microRNA, Genetics, Animals, Enhancer, 3' Untranslated Regions, Molecular Biology, Gene, Base Sequence, ved/biology, Gene Expression Profiling, Terminal Repeat Sequences, Computational Biology, Human genetics, Perciformes, MicroRNAs, 030104 developmental biology, Gene Expression Regulation, Sebastes schlegelii, 010606 plant biology & botany
Abstract: Sebastes schlegelii are an important species of fish found in the coastal areas of the Korea with significant commercial importance. Most studies thus far have been primarily focused on environmental factors; behavioural patterns, aquaculture, diseases and limited genetic studies with little to none related to either microRNAs (miRNAs) or transposable elements (TE). In order to understand biological roles of TE-derived miR-1269a, we examined expression pattern for miR-1269a and its target gene, KSR2, in various tissues of Sebastes schlegelii. Also, we performed luciferase reporter assay in HINAE cells. UCSC Genome Browser (https://genome.ucsc.edu/) was used to examine which TE is associated with miR-1269a. For the target genes for miR-1269a, the target genes associated with the miRNA were identified using miRDB (http://www.mirdb.org/) and TargetScan 7.1 (http://www.targetscan.org/vert_71/). A two-step miRNA kit, HB miR Multi Assay Kit™ System. I was used for the analysis of TE-derived miRNA expression patterns. The 3′UTR of KSR2 gene was cloned into the psiCHECK-2 vector. Subsequently co-transfected with miR-1269a mimics to HINAE cells for luciferase reporter assay. MiR-1269a was found to be derived from LTR retrotransposon, MLT2B. LTR-derived miR-1269a was highly expressed in the muscle, liver and gonad tissues of Sebastes schlegelii, but KSR2 revealed high expression in the brain. Co-transfection of KSR2 and miR-1269a mimic to HINAE cells showed high activity of miR-1269a in relation to KSR2. LTR-derived miR-1269a showed enhancer activity with relation to KSR2 in Sebastes schlegelii. The data may be used as a foundation for further investigation regarding correlation of miRNA and target genes in addition to other functional studies of biological significance in Sebastes schlegelii.
Published: 2019

3. TeaAS: a comprehensive database for alternative splicing in tea plants (Camellia sinensis)

Author: Mengsha Tang, Yi Yue, Shengrui Liu, Dahe Qiao, Chaoling Wei, Yanlin An, Xiaozeng Mi, Zhiyu Ma, and Hui Xie
Subjects: Datasets as Topic, Plant Science, Genome browser, Biology, Transcripts, computer.software_genre, Camellia sinensis, Database, Annotation, Databases, Genetic, RNA-Seq, KEGG, Gene, Tea plant, Alternative splicing, Botany, Online database, food and beverages, Genome project, Alternative Splicing, Genome and transcriptome, RNA, Plant, QK1-989, Isoforms, computer, Reference genome
Abstract: Alternative splicing (AS) increases the diversity of transcripts and proteins through the selection of different splice sites and plays an important role in the growth, development and stress tolerance of plants. With the release of the reference genome of the tea plant (Camellia sinensis) and the development of transcriptome sequencing, researchers have reported the existence of AS in tea plants. However, there is a lack of a platform, centered on different RNA-seq datasets, that provides comprehensive information on AS.To facilitate access to information on AS and reveal the molecular function of AS in tea plants, we established the first comprehensive AS database for tea plants (TeaAS, http://www.teaas.cn/index.php). In this study, 3.96 Tb reads from 66 different RNA-seq datasets were collected to identify AS events. TeaAS supports four methods of retrieval of AS information based on gene ID, gene name, annotation (non-redundant/Kyoto encyclopedia of genes and genomes/gene ontology annotation or chromosomal location) and RNA-seq data. It integrates data pertaining to genome annotation, type of AS event, transcript sequence, and isoforms expression levels from 66 RNA-seq datasets. The AS events resulting from different environmental conditions and that occurring in varied tissue types, and the expression levels of specific transcripts can be clearly identified through this online database. Moreover, it also provides two useful tools, Basic Local Alignment Search Tool and Generic Genome Browser, for sequence alignment and visualization of gene structure.The features of the TeaAS database make it a comprehensive AS bioinformatics platform for researchers, as well as a reference for studying AS events in woody crops. It could also be helpful for revealing the novel biological functions of AS in gene regulation in tea plants.
Published: 2021

4. Author Correction: Exploring the coronavirus pandemic with the WashU Virus Genome Browser

Author: Jennifer Flynn, Ting Wang, Xiaoyu Zhuo, Gavriel Matt, Deepak Purushotham, Changxu Fan, Daofeng Li, and Mayank N. K. Choudhary
Subjects: 2019-20 coronavirus outbreak, Coronavirus disease 2019 (COVID-19), Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), Genomics, Genome browser, Biology, medicine.disease_cause, Virology, Virus, Computational biology and bioinformatics, Pandemic, medicine, Genetics, Infectious diseases, Author Correction, Coronavirus
Published: 2020
Full Text: View/download PDF

5. REPIC: a database for exploring the N6-methyladenosine methylome

Author: Shun Liu, Allen Zhu, Mengjie Chen, and Chuan He
Subjects: Cell specific, lcsh:QH426-470, biology, Database, Genome browser, Methylation, computer.software_genre, ENCODE, Tissue specificity, Chromatin, m6A modification, lcsh:Genetics, Histone, lcsh:Biology (General), DNA methylation, biology.protein, lcsh:QH301-705.5, computer
Abstract: The REPIC (RNA EPItranscriptome Collection) database records about 10 million peaks called from publicly available m6A-seq and MeRIP-seq data using our unified pipeline. These data were collected from 672 samples of 49 studies, covering 61 cell lines or tissues in 11 organisms. REPIC allows users to query N6-methyladenosine (m6A) modification sites by specific cell lines or tissue types. In addition, it integrates m6A/MeRIP-seq data with 1418 histone ChIP-seq and 118 DNase-seq data tracks from the ENCODE project in a modern genome browser to present a comprehensive atlas of m6A methylation sites, histone modification sites, and chromatin accessibility regions. REPIC is accessible at https://repicmod.uchicago.edu/repic.
Published: 2020

6. Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesis

Author: Carlevaro-Fita J., Lanzos A., Feuerbach L., Hong C., Mas-Ponte D., Pedersen J. S., Abascal F., Amin S. B., Bader G. D., Barenboim J., Beroukhim R., Bertl J., Boroevich K. A., Brunak S., Campbell P. J., Chakravarty D., Chan C. W. Y., Chen K., Choi J. K., Deu-Pons J., Dhingra P., Diamanti K., Fink J. L., Fonseca N. A., Frigola J., Gambacorti Passerini C., Garsed D. W., Gerstein M., Getz G., Gonzalez-Perez A., Guo Q., Gut I. G., Haan D., Hamilton M. P., Haradhvala N. J., Harmanci A. O., Helmy M., Herrmann C., Hess J. M., Hobolth A., Hodzic E., Hornshoj H., Isaev K., Izarzugaza J. M. G., Johnson R., Johnson T. A., Juul M., Juul R. I., Kahles A., Kahraman A., Kellis M., Khurana E., Kim J., Kim J. K., Kim Y., Komorowski J., Korbel J. O., Kumar S., Larsson E., Lawrence M. S., Lee D., Lehmann K. -V., Li S., Li X., Lin Z., Liu E. M., Lochovsky L., Lou S., Madsen T., Marchal K., Martincorena I., Martinez-Fundichely A., Maruvka Y. E., McGillivray P. D., Meyerson W., Muinos F., Mularoni L., Nakagawa H., Nielsen M. M., Paczkowska M., Park K., Pich O., Pons T., Pulido-Tamayo S., Raphael B. J., Reimand J., Reyes-Salazar I., Reyna M. A., Rheinbay E., Rubin M. A., Rubio-Perez C., Sabarinathan R., Sahinalp S. C., Saksena G., Salichos L., Sander C., Schumacher S. E., Shackleton M., Shapira O., Shen C., Shrestha R., Shuai S., Sidiropoulos N., Sieverling L., Sinnott-Armstrong N., Stein L. D., Stuart J. M., Tamborero D., Tiao G., Tsunoda T., Umer H. M., Uuskula-Reimand L., Valencia A., Vazquez M., Verbeke L. P. C., Wadelius C., Wadi L., Wang J., Warrell J., Waszak S. M., Weischenfeldt J., Wheeler D. A., Wu G., Yu J., Zhang J., Zhang X., Zhang Y., Zhao Z., Zou L., von Mering C., Lanzós, Andrés [0000-0002-5844-2974], Feuerbach, Lars [0000-0003-1503-437X], Apollo - University of Cambridge Repository, Carlevaro-Fita, J, Lanzos, A, Feuerbach, L, Hong, C, Mas-Ponte, D, Pedersen, J, Abascal, F, Amin, S, Bader, G, Barenboim, J, Beroukhim, R, Bertl, J, Boroevich, K, Brunak, S, Campbell, P, Chakravarty, D, Chan, C, Chen, K, Choi, J, Deu-Pons, J, Dhingra, P, Diamanti, K, Fink, J, Fonseca, N, Frigola, J, Gambacorti Passerini, C, Garsed, D, Gerstein, M, Getz, G, Gonzalez-Perez, A, Guo, Q, Gut, I, Haan, D, Hamilton, M, Haradhvala, N, Harmanci, A, Helmy, M, Herrmann, C, Hess, J, Hobolth, A, Hodzic, E, Hornshoj, H, Isaev, K, Izarzugaza, J, Johnson, R, Johnson, T, Juul, M, Juul, R, Kahles, A, Kahraman, A, Kellis, M, Khurana, E, Kim, J, Kim, Y, Komorowski, J, Korbel, J, Kumar, S, Larsson, E, Lawrence, M, Lee, D, Lehmann, K, Li, S, Li, X, Lin, Z, Liu, E, Lochovsky, L, Lou, S, Madsen, T, Marchal, K, Martincorena, I, Martinez-Fundichely, A, Maruvka, Y, Mcgillivray, P, Meyerson, W, Muinos, F, Mularoni, L, Nakagawa, H, Nielsen, M, Paczkowska, M, Park, K, Pich, O, Pons, T, Pulido-Tamayo, S, Raphael, B, Reimand, J, Reyes-Salazar, I, Reyna, M, Rheinbay, E, Rubin, M, Rubio-Perez, C, Sabarinathan, R, Sahinalp, S, Saksena, G, Salichos, L, Sander, C, Schumacher, S, Shackleton, M, Shapira, O, Shen, C, Shrestha, R, Shuai, S, Sidiropoulos, N, Sieverling, L, Sinnott-Armstrong, N, Stein, L, Stuart, J, Tamborero, D, Tiao, G, Tsunoda, T, Umer, H, Uuskula-Reimand, L, Valencia, A, Vazquez, M, Verbeke, L, Wadelius, C, Wadi, L, Wang, J, Warrell, J, Waszak, S, Weischenfeldt, J, Wheeler, D, Wu, G, Yu, J, Zhang, J, Zhang, X, Zhang, Y, Zhao, Z, Zou, L, and von Mering, C
Subjects: Cell- och molekylärbiologi, Medicine (miscellaneous), Genome browser, medicine.disease_cause, ANNOTATION, Genome, Genòmica comparada, 0302 clinical medicine, Neoplasms, Databases, Genetic, Cancer genomics, Medicine and Health Sciences, TRANSCRIPTION, 631/208/212/748, 610 Medicine & health, Càncer, lcsh:QH301-705.5, 0303 health sciences, MALAT1, Women's cancers Radboud Institute for Molecular Life Sciences [Radboudumc 17], article, Genomics, Phenotype, Women's cancers Radboud Institute for Health Sciences [Radboudumc 17], Cell Transformation, Neoplastic, RNA, Long Noncoding, Disease Susceptibility, REGULATOR, General Agricultural and Biological Sciences, Medical Genetics, GENES, DATABASE, Computational biology, 631/67/69, Biology, Polymorphism, Single Nucleotide, General Biochemistry, Genetics and Molecular Biology, Evolution, Molecular, 03 medical and health sciences, Long non-coding RNAs (lncRNAs), cancer, tumorigenesis, somatic mutations, Cancer genomics, Comparative genomics, medicine, Biomarkers, Tumor, Animals, Humans, Medicinsk genetik, 030304 developmental biology, Comparative genomics, LANDSCAPE, GENCODE, urogenital system, Genome, Human, Biology and Life Sciences, EVOLUTION, Genòmica, lcsh:Biology (General), PRINCIPLES, GENOME BROWSER, CRISPR-Cas Systems, Carcinogenesis, Cell and Molecular Biology, 030217 neurology & neurosurgery
Abstract: Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we introduce the Cancer LncRNA Census (CLC), a compilation of 122 GENCODE lncRNAs with causal roles in cancer phenotypes. In contrast to existing databases, CLC requires strong functional or genetic evidence. CLC genes are enriched amongst driver genes predicted from somatic mutations, and display characteristic genomic features. Strikingly, CLC genes are enriched for driver mutations from unbiased, genome-wide transposon-mutagenesis screens in mice. We identified 10 tumour-causing mutations in orthologues of 8 lncRNAs, including LINC-PINT and NEAT1, but not MALAT1. Thus CLC represents a dataset of high-confidence cancer lncRNAs. Mutagenesis maps are a novel means for identifying deeply-conserved roles of lncRNAs in tumorigenesis., Communications Biology, 3 (1), ISSN:2399-3642
Published: 2020

7. A clinically validated whole genome pipeline for structural variant detection and analysis

Author: Alexander Kaplun, Shira Modai, Tzipora C. Falik-Zaccai, Naomi Meeks, Gregory Faust, Nir Neerman, and Limor Kalfon
Subjects: Break point, 0106 biological sciences, lcsh:QH426-470, Duplication, lcsh:Biotechnology, CNV, Population, Computational biology, Genome browser, Biology, 01 natural sciences, Genome, Deletion, 03 medical and health sciences, Gene Frequency, lcsh:TP248.13-248.65, Pipeline, Genetics, False positive paradox, Humans, education, Allele frequency, 030304 developmental biology, Whole genome sequencing, 0303 health sciences, education.field_of_study, Whole Genome Sequencing, Research, Breakpoint, Genetic Variation, Reproducibility of Results, Molecular Sequence Annotation, Clinical validation, lcsh:Genetics, Phenotype, Diagnostic console, DNA microarray, Structural variants, WGS, 010606 plant biology & botany, Biotechnology
Abstract: Background With the continuing decrease in cost of whole genome sequencing (WGS), we have already approached the point of inflection where WGS testing has become economically feasible, facilitating broader access to the benefits that are helping to define WGS as the new diagnostic standard. WGS provides unique opportunities for detection of structural variants; however, such analyses, despite being recognized by the research community, have not previously made their way into routine clinical practice. Results We have developed a clinically validated pipeline for highly specific and sensitive detection of structural variants basing on 30X PCR-free WGS. Using a combination of breakpoint analysis of split and discordant reads, and read depth analysis, the pipeline identifies structural variants down to single base pair resolution. False positives are minimized using calculations for loss of heterozygosity and bi-modal heterozygous variant allele frequencies to enhance heterozygous deletion and duplication detection respectively. Compound and potential compound combinations of structural variants and small sequence changes are automatically detected. To facilitate clinical interpretation, identified variants are annotated with phenotype information derived from HGMD Professional and population allele frequencies derived from public and Variantyx allele frequency databases. Single base pair resolution enables easy visual inspection of potentially causal variants using the IGV genome browser as well as easy biochemical validation via PCR. Analytical and clinical sensitivity and specificity of the pipeline has been validated using analysis of Genome in a Bottle reference genomes and known positive samples confirmed by orthogonal sequencing technologies. Conclusion Consistent read depth of PCR-free WGS enables reliable detection of structural variants of any size. Annotation both on gene and variant level allows clinicians to match reported patient phenotype with detected variants and confidently report causative finding in all clinical cases used for validation. Electronic supplementary material The online version of this article (10.1186/s12864-019-5866-z) contains supplementary material, which is available to authorized users.
Published: 2019

8. Microarray CGH analysis of hematological patients with del(20q)

Author: Yong Wang, Shuxiao Bai, Yanlei Gong, Chunxiao Wu, Suning Chen, Yafang Wu, Huiying Qiu, Jun Zhang, Juan Shen, Yongquan Xue, and Jinlan Pan
Subjects: Adult, Male, Genetics, Comparative Genomic Hybridization, Candidate gene, Microarray, medicine.diagnostic_test, Breakpoint, Chromosomes, Human, Pair 20, Hematology, Genome browser, Middle Aged, Biology, Hematologic Neoplasms, Gene duplication, medicine, Humans, Female, Chromosome Deletion, Chromosome 20, In Situ Hybridization, Fluorescence, Aged, Oligonucleotide Array Sequence Analysis, Comparative genomic hybridization, Fluorescence in situ hybridization
Abstract: Deletion of the long arm of chromosome 20 is a common abnormality underlying hematological malignancy. We analyzed 21 patients with hematologic diseases confirmed to carry the del(20q) by conventional cytogenetics and fluorescence in situ hybridization using microarray comparative genomic hybridization (aCGH). Seventeen patients were positive for del(20q), but this deletion was not detected in four patients. All deletions detected were interstitial of which continuous deletions were seen in 12 patients and discrete deletions in five. Three commonly deleted regions (CDRs) and two commonly retained regions (CRRs) were defined: CDR1 spanning 3.05Mb (34560497-37608229) within 20q11.23, CDR2 spanning 1.76Mb (37851501-39615698) within 20q12, CDR3 spanning 116Kb (48120412-48236791) within 20q13.13, CRR1 spanning 1.1Mb (29374726-30428250) within 20q11.21, and CRR2 spanning 2.5Mb (60484668-62963548) within 20q13.33. Duplications of retained regions (20q11.21) were found in five cases with similar erythroid hyperplasia (2 M6, 3 MDS). Moreover, duplication of 20p13-p11.21 was also found in two cases with M6. Using the CDRs and CRRs, we identified the candidate genes we searched for using the UCSC Genome Browser. Our data suggest that aCGH analysis is useful for more precisely defining breakpoints on 20q. Further work is required to identify candidate pathogenic genes within these CDRs and CRRs.
Published: 2015

9. Enhanced JBrowse plugins for epigenomics data visualization

Author: Brigitte T. Hofmeister and Robert J. Schmitz
Subjects: Epigenomics, 0301 basic medicine, Computer science, Genome browser, Genomics, Computational biology, lcsh:Computer applications to medicine. Medical informatics, computer.software_genre, Biochemistry, 03 medical and health sciences, chemistry.chemical_compound, 0302 clinical medicine, Data visualization, User experience design, Structural Biology, Databases, Genetic, Humans, Plug-in, Nucleotide Motifs, lcsh:QH301-705.5, Molecular Biology, 030304 developmental biology, Visualization, Genetics, 0303 health sciences, biology, business.industry, Applied Mathematics, DNA, Computer Science Applications, 030104 developmental biology, Histone, lcsh:Biology (General), chemistry, biology.protein, RNA, lcsh:R858-859.7, DNA microarray, Sequence motif, business, computer, Software, 030217 neurology & neurosurgery
Abstract: Background New sequencing techniques require new visualization strategies, as is the case for epigenomics data such as DNA base modifications, small non-coding RNAs, and histone modifications. Results We present a set of plugins for the genome browser JBrowse that are targeted for epigenomics visualizations. Specifically, we have focused on visualizing DNA base modifications, small non-coding RNAs, stranded read coverage, and sequence motif density. Additionally, we present several plugins for improved user experience such as configurable, high-quality screenshots. Conclusions In visualizing epigenomics with traditional genomics data, we see these plugins improving scientific communication and leading to discoveries within the field of epigenomics. Electronic supplementary material The online version of this article (10.1186/s12859-018-2160-z) contains supplementary material, which is available to authorized users.
Published: 2018

10. Construction of Pará rubber tree genome and multi-transcriptome database accelerates rubber researches

Author: Minami Matsui, Nyok-Sean Lau, Mika Kawashima, Ahmad Sofiman Othman, and Yuko Makita
Subjects: Latex biosynthesis, 0106 biological sciences, 0301 basic medicine, Biomedical Research, lcsh:QH426-470, Gene annotation, lcsh:Biotechnology, Genome browser, computer.software_genre, complex mixtures, 01 natural sciences, Genome, Database, Transcriptome, 03 medical and health sciences, Gene Expression Regulation, Plant, lcsh:TP248.13-248.65, R-gene, Genetics, Gene, Plant Proteins, Whole genome sequencing, biology, Sequence Analysis, RNA, Research, technology, industry, and agriculture, High-Throughput Nucleotide Sequencing, Molecular Sequence Annotation, cis-1,4-polyisoprene, Gene Annotation, biology.organism_classification, body regions, Hevea brasiliensis, lcsh:Genetics, 030104 developmental biology, RNA, Plant, Hevea, DNA microarray, Databases, Nucleic Acid, computer, Genome, Plant, 010606 plant biology & botany, Biotechnology
Abstract: Background Natural rubber is an economically important material. Currently the Pará rubber tree, Hevea brasiliensis is the main commercial source. Little is known about rubber biosynthesis at the molecular level. Next-generation sequencing (NGS) technologies brought draft genomes of three rubber cultivars and a variety of RNA sequencing (RNA-seq) data. However, no current genome or transcriptome databases (DB) are organized by gene. Results A gene-oriented database is a valuable support for rubber research. Based on our original draft genome sequence of H. brasiliensis RRIM600, we constructed a rubber tree genome and transcriptome DB. Our DB provides genome information including gene functional annotations and multi-transcriptome data of RNA-seq, full-length cDNAs including PacBio Isoform sequencing (Iso-Seq), ESTs and genome wide transcription start sites (TSSs) derived from CAGE technology. Using our original and publically available RNA-seq data, we calculated co-expressed genes for identifying functionally related gene sets and/or genes regulated by the same transcription factor (TF). Users can access multi-transcriptome data through both a gene-oriented web page and a genome browser. For the gene searching system, we provide keyword search, sequence homology search and gene expression search; users can also select their expression threshold easily. Conclusion The rubber genome and transcriptome DB provides rubber tree genome sequence and multi-transcriptomics data. This DB is useful for comprehensive understanding of the rubber transcriptome. This will assist both industrial and academic researchers for rubber and economically important close relatives such as R. communis, M. esculenta and J. curcas. The Rubber Transcriptome DB release 2017.03 is accessible at http://matsui-lab.riken.jp/rubber/. Electronic supplementary material The online version of this article (10.1186/s12864-017-4333-y) contains supplementary material, which is available to authorized users.
Published: 2018

11. BioNanoAnalyst: a visualisation tool to assess genome assembly quality using BioNano data

Author: Armin Scheben, Philipp E. Bayer, Yuxuan Yuan, David Edwards, and Chon-Kit Kenneth Chan
Subjects: 0301 basic medicine, Restriction enzyme cut site, BioNano, Sequence assembly, Genomics, Genome browser, Computational biology, Biology, lcsh:Computer applications to medicine. Medical informatics, Biochemistry, Genome, Structural variation, User-Computer Interface, 03 medical and health sciences, Structural Biology, Humans, Zoom, lcsh:QH301-705.5, Molecular Biology, Genetics, Internet, Genome, Human, Applied Mathematics, DNA Restriction Enzymes, Misassembly, Computer Science Applications, Optical map, 030104 developmental biology, lcsh:Biology (General), Chromosomes, Human, Pair 1, lcsh:R858-859.7, Human genome, Software, Reference genome
Abstract: Background Reference genome assemblies are valuable, as they provide insights into gene content, genetic evolution and domestication. The higher the quality of a reference genome assembly the more accurate the downstream analysis will be. During the last few years, major efforts have been made towards improving the quality of genome assemblies. However, erroneous and incomplete assemblies are still common. Complementary to DNA sequencing technologies, optical mapping has advanced genomic studies by facilitating the production of genome scaffolds and assessing structural variation. However, there are few tools available to comprehensively examine misassemblies in reference genome sequences using optical map data. Results We present BioNanoAnalyst, a software package to examine genome assemblies based on restriction endonuclease cut sites and optical map data. A graphical user interface (GUI) allows users to assess reference genome sequences on different computer platforms without the requirement of programming knowledge. The zoom function makes visualisation convenient, while a GFF3 format output file gives an option to directly visualise questionable assembly regions by location and nucleotides following import into a local genome browser. Conclusions BioNanoAnalyst is a tool to identify misassemblies in a reference genome sequence using optical map data. With the reported information, users can rapidly identify assembly errors and correct them using other software tools, which could facilitate an accurate downstream analysis. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1735-4) contains supplementary material, which is available to authorized users.
Published: 2017

12. SesameFG: an integrated database for the functional genomics of sesame

Author: Jingyin Yu, Yanxin Zhang, Xin Wei, Linhai Wang, Xiurong Zhang, Pan Liu, and Hao Gong
Subjects: 0301 basic medicine, Candidate gene, Genotype, Science, Genomics, Genome browser, Genes, Plant, Polymorphism, Single Nucleotide, Genome, Article, Sesamum, 03 medical and health sciences, Gene Expression Regulation, Plant, Databases, Genetic, Molecular breeding, Internet, Multidisciplinary, biology, business.industry, Gene Expression Profiling, biology.organism_classification, Biotechnology, Gene expression profiling, Phenotype, 030104 developmental biology, Medicine, business, Functional genomics, Genome, Plant, Microsatellite Repeats
Abstract: Sesame (Sesamum indicum L.) has high oil content, a small diploid genome and a short growth period, making it an attractive species for genetic studies on oilseed crops. With the advancement of next-generation sequencing technology, genomics and functional genomics research of sesame has developed quickly in the last few years, and large amounts of data have been generated. However, these results are distributed in many different publications, and there is a lack of integration. To promote functional genomics research of sesame, we collected genetic information combined with comprehensive phenotypic information and integrated them in the web-based database named SesameFG. The current version of SesameFG contains phenotypic information on agronomic traits of 705 sesame accessions, de novo assembled genomes of three sesame varieties, massive numbers of identified SNPs, gene expression profiles of five tissues, gene families, candidate genes for the important agronomic traits and genomic-SSR markers. All phenotypic and genotypic information in SesameFG is available for online queries and can be downloaded freely. SesameFG provides useful search functions and data mining tools, including Genome Browser and local BLAST services. SesameFG is freely accessible at http://ncgr.ac.cn/SesameFG/. SesameFG provides valuable resources and tools for functional genomics research and the molecular breeding of sesame.
Published: 2017

13. CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline

Author: Kevin Galens, Mahesh Vangala, Sonia Agrawal, Owen White, Samuel V. Angiuoli, Anup Mahurkar, Hervé Tettelin, Ricky S. Adkins, W. Florian Fricke, David R. Riley, Jonathan Crabtree, Cesar Arze, and Claire M. Fraser
Subjects: 0301 basic medicine, lcsh:QH426-470, lcsh:Biotechnology, 030106 microbiology, Microbial genomics, Cloud computing, Genome browser, Biology, computer.software_genre, Automation, 03 medical and health sciences, Whole-genome alignment, lcsh:TP248.13-248.65, Bioinformatics resource, Genetics, Comparative genomics, Virtual appliance, Database, business.industry, Computational genomics, Genomics, Automated analysis, Cloud Computing, Virtual machine, Pipeline (software), Genome, Microbial, lcsh:Genetics, Tree (data structure), 030104 developmental biology, Data mining, User interface, business, Sequence Alignment, Sequence Analysis, computer, Software, Biotechnology
Abstract: Background The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. Results CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in
Published: 2017

14. GTB – an online genome tolerance browser

Author: Colin Campbell, Mark F. Rogers, Tom R. Gaunt, Hashem A. Shihab, and Michael Ferlaino
Subjects: 0301 basic medicine, Computer science, Variant Effect Prediction, Genome browser, Web Browser, Prediction algorithm, computer.software_genre, medicine.disease_cause, Biochemistry, Genome, 0302 clinical medicine, Structural Biology, Neoplasms, Databases, Genetic, Prediction Algorithm, Nucleotide, Variant effect prediction, chemistry.chemical_classification, Mutation, Pathogenicity Prediction, Applied Mathematics, Computer Science Applications, Mutation (genetic algorithm), Data mining, DNA microarray, Pathogenicity prediction, Algorithms, In silico, Computational biology, Database, 03 medical and health sciences, medicine, Humans, Genome tolerance, Molecular Biology, Gene, Homeodomain Proteins, Internet, Genome Tolerance, Genome, Human, Models, Theoretical, Genome Browser, 030104 developmental biology, Receptors, LDL, chemistry, SNVs, Human genome, computer, 030217 neurology & neurosurgery
Abstract: Background Accurate methods capable of predicting the impact of single nucleotide variants (SNVs) are assuming ever increasing importance. There exists a plethora of in silico algorithms designed to help identify and prioritize SNVs across the human genome for further investigation. However, no tool exists to visualize the predicted tolerance of the genome to mutation, or the similarities between these methods. Results We present the Genome Tolerance Browser (GTB, http://gtb.biocompute.org.uk): an online genome browser for visualizing the predicted tolerance of the genome to mutation. The server summarizes several in silico prediction algorithms and conservation scores: including 13 genome-wide prediction algorithms and conservation scores, 12 non-synonymous prediction algorithms and four cancer-specific algorithms. Conclusion The GTB enables users to visualize the similarities and differences between several prediction algorithms and to upload their own data as additional tracks; thereby facilitating the rapid identification of potential regions of interest. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1436-4) contains supplementary material, which is available to authorized users.
Published: 2017

15. Temperature-dependent sRNA transcriptome of the Lyme disease spirochete

Author: Meghan Lybecker, Ivana Bilusic, Niko Popitsch, Renée Schroeder, and Philipp Rescheneder
Subjects: 0301 basic medicine, Small RNA, Intragenic RNA (intraRNA), 030106 microbiology, sRNA transcriptome, Genome browser, Biology, Genome, Transcriptome, Open Reading Frames, 03 medical and health sciences, Genetics, Humans, Borrelia burgdorferi, Gene, Repetitive Sequences, Nucleic Acid, Regulation of gene expression, Lyme Disease, Gene Expression Profiling, Temperature, Computational Biology, High-Throughput Nucleotide Sequencing, Gene Expression Regulation, Bacterial, biology.organism_classification, RNA, Bacterial, Regulatory RNA, Riboswitch, RNA, Small Untranslated, Antisense RNA (asRNA), DNA microarray, Research Article, Biotechnology
Abstract: Background Transmission of Borrelia burgdorferi from its tick vector to a vertebrate host requires extensive reprogramming of gene expression. Small regulatory RNAs (sRNA) have emerged in the last decade as important regulators of bacterial gene expression. Despite the widespread observation of sRNA-mediated gene regulation, only one sRNA has been characterized in the Lyme disease spirochete B. burgdorferi. We employed an sRNA-specific deep-sequencing approach to identify the small RNA transcriptome of B. burgdorferi at both 23 °C and 37 °C, which mimics in vitro the transmission from the tick vector to the mammalian host. Results We identified over 1000 sRNAs in B. burgdorferi revealing large amounts of antisense and intragenic sRNAs, as well as characteristic intergenic and 5′ UTR-associated sRNAs. A large fraction of the novel sRNAs (43%) are temperature-dependent and differentially expressed at the two temperatures, suggesting a role in gene regulation for adaptation during transmission. In addition, many genes important for maintenance of Borrelia during its enzootic cycle are associated with antisense RNAs or 5′ UTR sRNAs. RNA-seq data were validated for twenty-two of the sRNAs via Northern blot analyses. Conclusions Our study demonstrates that sRNAs are abundant and differentially expressed by environmental conditions suggesting that gene regulation via sRNAs is a common mechanism utilized in B. burgdorferi. In addition, the identification of antisense and intragenic sRNAs impacts the broadly used loss-of-function genetic approach used to study gene function and increases the coding potential of a small genome. To facilitate access to the analyzed RNA-seq data we have set-up a website at http://www.cibiv.at/~niko/bbdb/ that includes a UCSC browser track hub. By clicking on the respective link, researchers can interactively inspect the data in the UCSC genome browser (Kent et al., Genome Res 12:996-1006, 2002). Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3398-3) contains supplementary material, which is available to authorized users.
Published: 2017

16. What everybody should know about the rat genome and its online resources

Author: Howard J. Jacob, Jim Kent, Xosé M. Fernández-Suárez, Ewan Birney, George M. Weinstock, Kim D. Pruitt, Richard A. Gibbs, Kim C. Worley, Donna Maglott, Donna Karolchik, Garth Brown, and Simon N. Twigger
Subjects: Complex disease, Genome browser, Computational biology, Biology, Polymorphism, Single Nucleotide, Genome, Rats, Mutant Strains, Article, Rat Genome Database, Databases, Genetic, Genetics, RefSeq, Animals, Humans, Ensembl, Gene, Whole genome sequencing, Internet, Genetic Diseases, Inborn, Computational Biology, Genetic Variation, Genomics, Sequence Analysis, DNA, Rats, Disease Models, Animal, Haplotypes
Abstract: It has been four years since the original publication of the draft sequence of the rat genome. Five groups are now working together to assemble, annotate and release an updated version of the rat genome. As the prevailing model for physiology, complex disease and pharmacological studies, there is an acute need for the rat's genomic resources to keep pace with the rat's prominence in the laboratory. In this commentary, we describe the current status of the rat genome sequence and the plans for its impending 'upgrade'. We then cover the key online resources providing access to the rat genome, including the new SNP views at Ensembl, the RefSeq and Genes databases at the US National Center for Biotechnology Information, Genome Browser at the University of California Santa Cruz and the disease portals for cardiovascular disease and obesity at the Rat Genome Database.
Published: 2008

17. An expression atlas of rice mRNAs and small RNAs

Author: R. C. Venu, Cheng Lu, Kalyan Vemaraju, Pamela J. Green, Manoj Pillay, Blake C. Meyers, Karthik Kulkarni, Wenzhong Wang, André Beló, Kan Nobuta, and Guo-Liang Wang
Subjects: Genetics, Small RNA, Transcription, Genetic, Sequence Analysis, RNA, Sequence analysis, Biomedical Engineering, food and beverages, RNA, Oryza, Bioengineering, Genomics, Genome browser, Biology, Applied Microbiology and Biotechnology, Genome, Massively parallel signature sequencing, Gene Expression Regulation, Plant, RNA, Plant, RNA, Small Nuclear, Molecular Medicine, RNA, Messenger, Gene, Genome, Plant, Gene Library, Biotechnology
Abstract: Identification of all expressed transcripts in a sequenced genome is essential both for genome analysis and for realization of the goals of systems biology. We used the transcriptional profiling technology called 'massively parallel signature sequencing' to develop a comprehensive expression atlas of rice (Oryza sativa cv Nipponbare). We sequenced 46,971,553 mRNA transcripts from 22 libraries, and 2,953,855 small RNAs from 3 libraries. The data demonstrate widespread transcription throughout the genome, including sense expression of at least 25,500 annotated genes and antisense expression of nearly 9,000 annotated genes. An additional set of approximately 15,000 mRNA signatures mapped to unannotated genomic regions. The majority of the small RNA data represented lower abundance short interfering RNAs that match repetitive sequences, intergenic regions and genes. Among these, numerous clusters of highly regulated small RNAs were readily observed. We developed a genome browser (http://mpss.udel.edu/rice) for public access to the transcriptional profiling data for this important crop.
Published: 2007

18. BioWardrobe: an integrated platform for analysis of epigenomics and transcriptomics data

Author: Andrey V. Kartashov and Artem Barski
Subjects: Epigenomics, Jumonji Domain-Containing Histone Demethylases, Computer science, 0206 medical engineering, Experiment management, Scalable Vector Graphics, Gene Expression, 02 engineering and technology, Genome browser, Biology, computer.software_genre, Histones, Transcriptome, Mice, 03 medical and health sciences, Software, Gene expression, Animals, Humans, 030304 developmental biology, Genetics, 0303 health sciences, Database, business.industry, Gene Expression Profiling, High-Throughput Nucleotide Sequencing, RNA, T-Lymphocytes, Helper-Inducer, computer.file_format, Data science, Chromatin, DNA-Binding Proteins, User interface, business, computer, 020602 bioinformatics, Chromatin Immunoprecipitation Sequencing
Abstract: Development of next-generation sequencing has revolutionized molecular biology by enhancing the ability to perform genome-wide studies. However, due to the need for bioinformatic expertise and the size of resulting datasets, use of these technologies is still beyond the capabilities of many laboratories. Herein, we present the Wardrobe Experiment Management System, which allows users to store, visualize and analyze epigenomic and transcriptomic next-generation sequencing data using a biologist-friendly, web-based graphical user interface without the need for programming expertize. Wardrobe can be installed on consumer-class hardware within an institutional local network. Analysis capabilities include predefined pipelines that allow the user to download data from either institutional core facilities or public databases, perform quality control, map reads and visualize data on a built-in mirror of the University of California, Santa Cruz (UCSC) genome browser. Reads per kilobase of transcript per million reads mapped (RPKMs) are calculated for RNA sequencing (RNA-Seq), and islands of enrichment are identified for chromatin immunoprecipitation sequencing (ChIP-Seq) and similar datasets. Advanced analysis capabilities include analyzing differential gene expression and binding and creating average tag density profiles and heatmaps. The Wardrobe package and documentation is available at https://biowardrobe.com. A limited functionality demo-version is available at http://demo.biowardrobe.com
Published: 2015

19. RNASeqBrowser: A genome browser for simultaneous visualization of raw strand specific RNAseq reads and UCSC genome browser custom tracks

Author: John Lai, Melanie Lehman, Chenwei Wang, Atul Sajjanhar, Colleen C. Nelson, Jiyuan An, David L. A. Wood, and Gregor Tevz
Subjects: 060102 Bioinformatics, Genome browser, SNP, RNA-Seq, Computational biology, Biology, Genome, Nucleic acid secondary structure, simultaneous visualization, Software, INDEL Mutation, Databases, Genetic, Genetics, 111203 Cancer Genetics, Internet, Sequence Analysis, RNA, business.industry, raw strand-specific RNAseq, Computational Biology, RNA secondary structure, RNA -seq, Visualization, custom tracks, The Internet, RNA-seq, UCSC genome browser, DNA microarray, business, Biotechnology
Abstract: Background Strand specific RNAseq data is now more common in RNAseq projects. Visualizing RNAseq data has become an important matter in Analysis of sequencing data. The most widely used visualization tool is the UCSC genome browser that introduced the custom track concept that enabled researchers to simultaneously visualize gene expression at a particular locus from multiple experiments. Our objective of the software tool is to provide friendly interface for visualization of RNAseq datasets. Results This paper introduces a visualization tool (RNASeqBrowser) that incorporates and extends the functionality of the UCSC genome browser. For example, RNASeqBrowser simultaneously displays read coverage, SNPs, InDels and raw read tracks with other BED and wiggle tracks -- all being dynamically built from the BAM file. Paired reads are also connected in the browser to enable easier identification of novel exon/intron borders and chimaeric transcripts. Strand specific RNAseq data is also supported by RNASeqBrowser that displays reads above (positive strand transcript) or below (negative strand transcripts) a central line. Finally, RNASeqBrowser was designed for ease of use for users with few bioinformatic skills, and incorporates the features of many genome browsers into one platform. Conclusions The features of RNASeqBrowser: (1) RNASeqBrowser integrates UCSC genome browser and NGS visualization tools such as IGV. It extends the functionality of the UCSC genome browser by adding several new types of tracks to show NGS data such as individual raw reads, SNPs and InDels. (2) RNASeqBrowser can dynamically generate RNA secondary structure. It is useful for identifying non-coding RNA such as miRNA. (3) Overlaying NGS wiggle data is helpful in displaying differential expression and is simple to implement in RNASeqBrowser. (4) NGS data accumulates a lot of raw reads. Thus, RNASeqBrowser collapses exact duplicate reads to reduce visualization space. Normal PC’s can show many windows of NGS individual raw reads without much delay. (5) Multiple popup windows of individual raw reads provide users with more viewing space. This avoids existing approaches (such as IGV) which squeeze all raw reads into one window. This will be helpful for visualizing multiple datasets simultaneously. RNASeqBrowser and its manual are freely available at http://www.australianprostatecentre.org/research/software/rnaseqbrowser or http://sourceforge.net/projects/rnaseqbrowser/ Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1346-2) contains supplementary material, which is available to authorized users.
Published: 2015

20. YersiniaBase: a genomic resource and analysis platform for comparative analysis of Yersinia

Author: Cheuk Chuen Siow, Avirup Dutta, Shi Yang Tan, Hamed Heydari, Naresh V. R. Mutha, Siew Woh Choo, Mia Yang Ang, Wei Yee Wee, Guat Jah Wong, and Nicholas S. Jakubovics
Subjects: Yersinia Infections, YersiniaBase, Genomics, Computational biology, Genome browser, Biochemistry, Genome, Microbiology, Database, User-Computer Interface, Pathogenomics, Structural Biology, Databases, Genetic, Humans, Yersinia pseudotuberculosis, Genomic resources, Yersinia enterocolitica, Molecular Biology, Phylogeny, Internet, Virulence, biology, Applied Mathematics, Comparative analysis, Chromosome Mapping, biology.organism_classification, Yersinia, Computer Science Applications, Search Engine, Yersinia pestis, bacteria, Genome, Bacterial, Software
Abstract: Background Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causes plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species. With the advances in sequencing technologies, many genomes of Yersinia have been sequenced. However, there is currently no specialized platform to hold the rapidly-growing Yersinia genomic data and to provide analysis tools particularly for comparative analyses, which are required to provide improved insights into their biology, evolution and pathogenicity. Description To facilitate the ongoing and future research of Yersinia, especially those generally considered non-pathogenic species, a well-defined repository and analysis platform is needed to hold the Yersinia genomic data and analysis tools for the Yersinia research community. Hence, we have developed the YersiniaBase, a robust and user-friendly Yersinia resource and analysis platform for the analysis of Yersinia genomic data. YersiniaBase has a total of twelve species and 232 genome sequences, of which the majority are Yersinia pestis. In order to smooth the process of searching genomic data in a large database, we implemented an Asynchronous JavaScript and XML (AJAX)-based real-time searching system in YersiniaBase. Besides incorporating existing tools, which include JavaScript-based genome browser (JBrowse) and Basic Local Alignment Search Tool (BLAST), YersiniaBase also has in-house developed tools: (1) Pairwise Genome Comparison tool (PGC) for comparing two user-selected genomes; (2) Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomics analysis of Yersinia genomes; (3) YersiniaTree for constructing phylogenetic tree of Yersinia. We ran analyses based on the tools and genomic data in YersiniaBase and the preliminary results showed differences in virulence genes found in Yersinia pestis and Yersinia pseudotuberculosis compared to other Yersinia species, and differences between Yersinia enterocolitica subsp. enterocolitica and Yersinia enterocolitica subsp. palearctica. Conclusions YersiniaBase offers free access to wide range of genomic data and analysis tools for the analysis of Yersinia. YersiniaBase can be accessed at http://yersinia.um.edu.my. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0422-y) contains supplementary material, which is available to authorized users.
Published: 2015

21. Using Genomic Databases for Sequence-Based Biological Discovery

Author: Andreas D. Baxevanis
Subjects: business.industry, Genomics, Genome browser, Genome, Data science, Documentation, Genetics, Molecular Medicine, Ensembl, The Internet, Human genome, business, Molecular Biology, Genetics (clinical), Sequence (medicine)
Abstract: The inherent potential underlying the sequence data produced by the International Human Genome Sequencing Consortium and other systematic sequencing projects is, obviously, tremendous. As such, it becomes increasingly important that all biologists have the ability to navigate through and cull important information from key publicly available databases. The continued rapid rise in available sequence information, particularly as model organism data is generated at breakneck speed, also underscores the necessity for all biologists to learn how to effectively make their way through the expanding “sequence information space.” This review discusses some of the more commonly used tools for sequence discovery; tools have been developed for the effective and efficient mining of sequence information. These include LocusLink, which provides a gene-centric view of sequence-based information, as well as the 3 major genome browsers: the National Center for Biotechnology Information Map Viewer, the University of California Santa Cruz Genome Browser, and the European Bioinformatics Institute’s Ensembl system. An overview of the types of information available through each of these front-ends is given, as well as information on tutorials and other documentation intended to increase the reader’s familiarity with these tools.
Published: 2003

22. A customized Web portal for the genome of the ctenophore Mnemiopsis leidyi

Author: Joseph F. Ryan, Katherine M. Siewert, Anh-Dao Nguyen, R. Travis Moreland, Andreas D. Baxevanis, Christine E. Schnitzler, Tyra G. Wolfsberg, and Bernard Koch
Subjects: Genetics, Internet, Genome, business.industry, Mnemiopsis, Ctenophora, Genome browser, Customized Web portal, Genome project, Computational biology, Biology, biology.organism_classification, Mnemiopsis leidyi, Database, Data sequences, Data visualization, Gene wiki, GenBank, Animals, business, Biotechnology, Web site
Abstract: Mnemiopsis leidyi is a ctenophore native to the coastal waters of the western Atlantic Ocean. A number of studies on Mnemiopsis have led to a better understanding of many key biological processes, and these studies have contributed to the emergence of Mnemiopsis as an important model for evolutionary and developmental studies. Recently, we sequenced, assembled, annotated, and performed a preliminary analysis on the 150-megabase genome of the ctenophore, Mnemiopsis. This sequencing effort has produced the first set of whole-genome sequencing data on any ctenophore species and is amongst the first wave of projects to sequence an animal genome de novo solely using next-generation sequencing technologies. The Mnemiopsis Genome Project Portal ( http://research.nhgri.nih.gov/mnemiopsis/ ) is intended both as a resource for obtaining genomic information on Mnemiopsis through an intuitive and easy-to-use interface and as a model for developing customized Web portals that enable access to genomic data. The scope of data available through this Portal goes well beyond the sequence data available through GenBank, providing key biological information not available elsewhere, such as pathway and protein domain analyses; it also features a customized genome browser for data visualization. We expect that the availability of these data will allow investigators to advance their own research projects aimed at understanding phylogenetic diversity and the evolution of proteins that play a fundamental role in metazoan development. The overall approach taken in the development of this Web site can serve as a viable model for disseminating data from whole-genome sequencing projects, framed in a way that best-serves the specific needs of the scientific community.
Published: 2014

23. SFGD: a comprehensive platform for mining functional information from soybean transcriptome data and its use in identifying acyl-lipid metabolism pathways

Author: Zhenhai Zhang, Juan Yu, Jiangang Wei, Wenying Xu, Yi Ling, and Zhen Su
Subjects: Genomics, Genome browser, Biology, Proteomics, Genome, Database, Transcriptome, Gene Expression Regulation, Plant, Databases, Genetic, Genetics, Data Mining, Gene Regulatory Networks, Nucleotide Motifs, Promoter Regions, Genetic, Internet, Gene Expression Profiling, Computational Biology, High-Throughput Nucleotide Sequencing, food and beverages, Lipid Metabolism, Gene expression profiling, MicroRNAs, Organ Specificity, Soybeans, DNA microarray, Functional genomics, Algorithms, Metabolic Networks and Pathways, Biotechnology
Abstract: Soybean (Glycine max L.) is one of the world’s most important leguminous crops producing high-quality protein and oil. Increasing the relative oil concentration in soybean seeds is many researchers’ goal, but a complete analysis platform of functional annotation for the genes involved in the soybean acyl-lipid pathway is still lacking. Following the success of soybean whole-genome sequencing, functional annotation has become a major challenge for the scientific community. Whole-genome transcriptome analysis is a powerful way to predict genes with biological functions. It is essential to build a comprehensive analysis platform for integrating soybean whole-genome sequencing data, the available transcriptome data and protein information. This platform could also be used to identify acyl-lipid metabolism pathways. In this study, we describe our construction of the Soybean Functional Genomics Database (SFGD) using Generic Genome Browser (Gbrowse) as the core platform. We integrated microarray expression profiling with 255 samples from 14 groups’ experiments and mRNA-seq data with 30 samples from four groups’ experiments, including spatial and temporal transcriptome data for different soybean development stages and environmental stresses. The SFGD includes a gene co-expression regulatory network containing 23,267 genes and 1873 miRNA-target pairs, and a group of acyl-lipid pathways containing 221 enzymes and more than 1550 genes. The SFGD also provides some key analysis tools, i.e. BLAST search, expression pattern search and cis-element significance analysis, as well as gene ontology information search and single nucleotide polymorphism display. The SFGD is a comprehensive database integrating genome and transcriptome data, and also for soybean acyl-lipid metabolism pathways. It provides useful toolboxes for biologists to improve the accuracy and robustness of soybean functional genomics analysis, further improving understanding of gene regulatory networks for effective crop improvement. The SFGD is publically accessible at http://bioinformatics.cau.edu.cn/SFGD/ , with all data available for downloading.
Published: 2014

24. What google maps can do for biomedical data dissemination: examples and a design study

Author: Radu Jianu and David H. Laidlaw
Subjects: QA75, Biomedical Research, Bioinformatics, Interface (Java), Computer science, media_common.quotation_subject, Information Storage and Retrieval, 02 engineering and technology, Genome browser, Data type, General Biochemistry, Genetics and Molecular Biology, World Wide Web, 03 medical and health sciences, Interaction network, Biological visualization, Data dissemination, 0202 electrical engineering, electronic engineering, information engineering, Regulation networks, Software system, 030304 developmental biology, media_common, Information Services, Medicine(all), Internet, 0303 health sciences, Creative visualization, Biological data, Biochemistry, Genetics and Molecular Biology(all), 020207 software engineering, General Medicine, Data science, Visualization, Design guidelines, RA, Research Article, Maps as Topic
Abstract: BACKGROUND: Biologists often need to assess whether unfamiliar datasets warrant the time investment required for more detailed exploration. Basing such assessments on brief descriptions provided by data publishers is unwieldy for large datasets that contain insights dependent on specific scientific questions. Alternatively, using complex software systems for a preliminary analysis may be deemed as too time consuming in itself, especially for unfamiliar data types and formats. This may lead to wasted analysis time and discarding of potentially useful data.\ud \ud RESULTS: We present an exploration of design opportunities that the Google Maps interface offers to biomedical data visualization. In particular, we focus on synergies between visualization techniques and Google Maps that facilitate the development of biological visualizations which have both low-overhead and sufficient expressivity to support the exploration of data at multiple scales. The methods we explore rely on displaying pre-rendered visualizations of biological data in browsers, with sparse yet powerful interactions, by using the Google Maps API. We structure our discussion around five visualizations: a gene co-regulation visualization, a heatmap viewer, a genome browser, a protein interaction network, and a planar visualization of white matter in the brain. Feedback from collaborative work with domain experts suggests that our Google Maps visualizations offer multiple, scale-dependent perspectives and can be particularly helpful for unfamiliar datasets due to their accessibility. We also find that users, particularly those less experienced with computer use, are attracted by the familiarity of the Google Maps API. Our five implementations introduce design elements that can benefit visualization developers.\ud \ud CONCLUSIONS: We describe a low-overhead approach that lets biologists access readily analyzed views of unfamiliar scientific datasets. We rely on pre-computed visualizations prepared by data experts, accompanied by sparse and intuitive interactions, and distributed via the familiar Google Maps framework. Our contributions are an evaluation demonstrating the validity and opportunities of this approach, a set of design guidelines benefiting those wanting to create such visualizations, and five concrete example visualizations.
Published: 2013

25. BioBin: a bioinformatics tool for automating the binning of rare variants using publicly available biological knowledge

Author: Marylyn D. Ritchie, Alex T. Frase, John R. Wallace, Carrie B. Moore, and Sarah A. Pendergrass
Subjects: dbSNP, Caveolin 2, Kruppel-Like Transcription Factors, Nerve Tissue Proteins, Genomics, Genome-wide association study, Genome browser, Biology, Bioinformatics, 03 medical and health sciences, 0302 clinical medicine, Zinc Finger Protein Gli3, Databases, Genetic, Genetics, Humans, Abnormalities, Multiple, Computer Simulation, Exome, Protein Families Database, 1000 Genomes Project, KEGG, Genetics (clinical), Randomized Controlled Trials as Topic, 030304 developmental biology, 0303 health sciences, Biological data, Genome, Human, Research, Computational Biology, Genetic Variation, Hematologic Diseases, Polydactyly, Phenotype, Vestibular Diseases, Case-Control Studies, Face, 030220 oncology & carcinogenesis, Software, Genome-Wide Association Study
Abstract: Background With the recent decreasing cost of genome sequence data, there has been increasing interest in rare variants and methods to detect their association to disease. We developed BioBin, a flexible collapsing method inspired by biological knowledge that can be used to automate the binning of low frequency variants for association testing. We also built the Library of Knowledge Integration (LOKI), a repository of data assembled from public databases, which contains resources such as: dbSNP and gene Entrez database information from the National Center for Biotechnology (NCBI), pathway information from Gene Ontology (GO), Protein families database (Pfam), Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, NetPath - signal transduction pathways, Open Regulatory Annotation Database (ORegAnno), Biological General Repository for Interaction Datasets (BioGrid), Pharmacogenomics Knowledge Base (PharmGKB), Molecular INTeraction database (MINT), and evolutionary conserved regions (ECRs) from UCSC Genome Browser. The novelty of BioBin is access to comprehensive knowledge-guided multi-level binning. For example, bin boundaries can be formed using genomic locations from: functional regions, evolutionary conserved regions, genes, and/or pathways. Methods We tested BioBin using simulated data and 1000 Genomes Project low coverage data to test our method with simulated causative variants and a pairwise comparison of rare variant (MAF < 0.03) burden differences between Yoruba individuals (YRI) and individuals of European descent (CEU). Lastly, we analyzed the NHLBI GO Exome Sequencing Project Kabuki dataset, a congenital disorder affecting multiple organs and often intellectual disability, contrasted with Complete Genomics data as controls. Results The results from our simulation studies indicate type I error rate is controlled, however, power falls quickly for small sample sizes using variants with modest effect sizes. Using BioBin, we were able to find simulated variants in genes with less than 20 loci, but found the sensitivity to be much less in large bins. We also highlighted the scale of population stratification between two 1000 Genomes Project data, CEU and YRI populations. Lastly, we were able to apply BioBin to natural biological data from dbGaP and identify an interesting candidate gene for further study. Conclusions We have established that BioBin will be a very practical and flexible tool to analyze sequence data and potentially uncover novel associations between low frequency variants and complex disease.
Published: 2013

26. NGS-Trex: Next Generation Sequencing Transcriptome profile explorer

Author: Lara Boatti, Ilenia Boria, Graziano Pesole, and Flavio Mignone
Subjects: Sequence Analysis, RNA, Research, Applied Mathematics, Computational Biology, High-Throughput Nucleotide Sequencing, Genome browser, Biology, computer.software_genre, Biochemistry, DNA sequencing, Computer Science Applications, Upload, Identification (information), Annotation, Workflow, Structural Biology, Data Mining, Data mining, DNA microarray, User interface, Transcriptome, Molecular Biology, computer, Software
Abstract: Next-Generation Sequencing (NGS) technology has exceptionally increased the ability to sequence DNA in a massively parallel and cost-effective manner. Nevertheless, NGS data analysis requires bioinformatics skills and computational resources well beyond the possibilities of many "wet biology" laboratories. Moreover, most of projects only require few sequencing cycles and standard tools or workflows to carry out suitable analyses for the identification and annotation of genes, transcripts and splice variants found in the biological samples under investigation. These projects can take benefits from the availability of easy to use systems to automatically analyse sequences and to mine data without the preventive need of strong bioinformatics background and hardware infrastructure. To address this issue we developed an automatic system targeted to the analysis of NGS data obtained from large-scale transcriptome studies. This system, we named NGS-Trex (NGS Transcriptome profile explorer) is available through a simple web interface http://www.ngs-trex.org and allows the user to upload raw sequences and easily obtain an accurate characterization of the transcriptome profile after the setting of few parameters required to tune the analysis procedure. The system is also able to assess differential expression at both gene and transcript level (i.e. splicing isoforms) by comparing the expression profile of different samples. By using simple query forms the user can obtain list of genes, transcripts, splice sites ranked and filtered according to several criteria. Data can be viewed as tables, text files or through a simple genome browser which helps the visual inspection of the data. NGS-Trex is a simple tool for RNA-Seq data analysis mainly targeted to "wet biology" researchers with limited bioinformatics skills. It offers simple data mining tools to explore transcriptome profiles of samples investigated taking advantage of NGS technologies.
Published: 2013

27. Comparative analysis and visualization of multiple collinear genomes

Author: Leonard McMillan, Jeremy Wang, and Fernando Pardo-Manuel de Villena
Subjects: Genomic data, ved/biology.organism_classification_rank.species, Mice, Inbred Strains, Genome browser, Computational biology, Biology, Polymorphism, Single Nucleotide, Biochemistry, Genome, Mice, 03 medical and health sciences, 0302 clinical medicine, Structural Biology, Phylogenetics, Animals, Cluster Analysis, Model organism, Molecular Biology, Phylogeny, 030304 developmental biology, Internet, 0303 health sciences, ved/biology, Applied Mathematics, Computer Science Applications, Visualization, Proceedings, ComputingMethodologies_PATTERNRECOGNITION, Analysis tools, DNA microarray, Software, 030217 neurology & neurosurgery
Abstract: Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains.
Published: 2012

28. A Draft Sequence of the Puerto Rican Parrot Genome (Amazona vittata) – a Genome Project funded by a Local Community Effort

Author: Taras Oleksyk
Subjects: Genetics, Critically endangered, Contig, biology, Evolutionary biology, Sequence assembly, General Materials Science, Genome project, Genome browser, biology.organism_classification, Genome, DNA sequencing, Amazona vittata
Abstract: The genome of the Puerto Rican parrot (Amazona vittata) has been sequenced and assembled in an international collaboration supported by many individual donations from the people of Puerto Rico. This is a critically endangered endemic bird, the only surviving native parrot species in the territory of the United States, and the first parrot belonging to the large genus Amazona to have its genome sequenced and assembled. A genome of one A. vittata female was sequenced resulting in a total of almost 42.5 billion nucleotide bases equivalent to 26.89X average coverage depth. After filtering out the short fragments (
Published: 2011

29. Visualization and Exploration of Conserved Regulatory Modules Using ReXSpecies 2

Author: Hans R. Schöler, Georg Fuellen, Daniel Esch, and Stephan Struckmann
Subjects: Genetics, Binding Sites, Phylogenetic tree, Evolution, Software Validation, Computational Biology, Context (language use), Genome browser, Computational biology, Phylogenetic footprinting, Biology, Set (abstract data type), DNA binding site, Identification (information), QH359-425, Luciferases, Promoter Regions, Genetic, Octamer Transcription Factor-3, Software, Ecology, Evolution, Behavior and Systematics, Position-Specific Scoring Matrices, Transcription Factors
Abstract: Background The prediction of transcription factor binding sites is difficult for many reasons. Thus, filtering methods are needed to enrich for biologically relevant (true positive) matches in the large amount of computational predictions that are frequently generated from promoter sequences. Results ReXSpecies 2 filters predictions of transcription factor binding sites and generates a set of figures displaying them in evolutionary context. More specifically, it uses position specific scoring matrices to search for motifs that specify transcription factor binding sites. It removes redundant matches and filters the remaining matches by the phylogenetic group that the matrices belong to. It then identifies potential transcriptional modules, and generates figures that highlight such modules, taking evolution into consideration. Module formation, scoring by evolutionary criteria and visual clues reduce the amount of predictions to a manageable scale. Identification of transcription factor binding sites of particular functional importance is left to expert filtering. ReXSpecies 2 interacts with genome browsers to enable scientists to filter predictions together with other sequence-related data. Conclusions Based on ReXSpecies 2, we derive plausible hypotheses about the regulation of pluripotency. Our tool is designed to analyze transcription factor binding site predictions considering their common pattern of occurrence, highlighting their evolutionary history.
Published: 2011

30. CloVR-Microbe: Assembly, gene finding and functional annotation of raw sequence data from single microbial genome projects – standard operating procedure, version 1.0

Author: James White, Owen White, Samuel Angiuoli, W. Florian Fricke, Kevin Galens, Cesar Arze, Malcolm Matalka, Michelle Gwinn Giglio, and The CloVR Team
Subjects: Annotation, Contig, GenBank, Gene prediction, General Materials Science, Data mining, Genome browser, Shotgun Sequence Assembly, Biology, computer.software_genre, computer, Pipeline (software), Sequence (medicine)
Abstract: The CloVR-Microbe pipeline performs the basic processing and analysis steps required for standard microbial single-genome sequencing projects: A) Whole-genome shotgun sequence assembly; B) Identification of protein and RNA-coding genes; and C) Functional gene annotation. B) and C) are based on the IGS Annotation Engine (http://ae.igs.umaryland.edu/), which is described elsewhere (K Galens et al. submitted). The assembly component of CloVR- Microbe can be executed independently from the gene identification and annotation components. Alternatively, pre-assembled sequence contigs can be used to perform gene identifications and annotations. The pipeline input may consist of unassembled raw sequence reads from the Sanger, Roche/454 GS FLX or Illumina GAII or HiSeq sequencing platforms or of combinations of Sanger and Roche/454 sequence data. The pipeline output consists of results and summary files generated during the different pipeline steps. Annotated sequence files are generated that are compatible with common genome browser tools and can be submitted to the GenBank repository at NCBI. This protocol is available in CloVR beta versions 0.5 and 0.6.
Published: 2011

31. Rice-Map: a new-generation rice genome browser

Author: Ge Gao, Xiaocheng Gu, Jun Wang, Liang Tang, Zhe Li, Shuqi Zhao, Lei Kong, Jingchu Luo, and He Zhang
Subjects: Epigenomics, Genetic Markers, DNA, Plant, lcsh:QH426-470, lcsh:Biotechnology, Genome browser, Computational biology, Biology, Genes, Plant, Genome, Database, User-Computer Interface, Annotation, lcsh:TP248.13-248.65, Genetics, Expressed Sequence Tags, Internet, Oryza sativa, business.industry, Gene Expression Profiling, Chromosome Mapping, food and beverages, Molecular Sequence Annotation, Oryza, Data warehouse, Biotechnology, lcsh:Genetics, DNA microarray, business, Functional genomics, Genome, Plant, Software
Abstract: Background The concurrent release of rice genome sequences for two subspecies (Oryza sativa L. ssp. japonica and Oryza sativa L. ssp. indica) facilitates rice studies at the whole genome level. Since the advent of high-throughput analysis, huge amounts of functional genomics data have been delivered rapidly, making an integrated online genome browser indispensable for scientists to visualize and analyze these data. Based on next-generation web technologies and high-throughput experimental data, we have developed Rice-Map, a novel genome browser for researchers to navigate, analyze and annotate rice genome interactively. Description More than one hundred annotation tracks (81 for japonica and 82 for indica) have been compiled and loaded into Rice-Map. These pre-computed annotations cover gene models, transcript evidences, expression profiling, epigenetic modifications, inter-species and intra-species homologies, genetic markers and other genomic features. In addition to these pre-computed tracks, registered users can interactively add comments and research notes to Rice-Map as User-Defined Annotation entries. By smoothly scrolling, dragging and zooming, users can browse various genomic features simultaneously at multiple scales. On-the-fly analysis for selected entries could be performed through dedicated bioinformatic analysis platforms such as WebLab and Galaxy. Furthermore, a BioMart-powered data warehouse "Rice Mart" is offered for advanced users to fetch bulk datasets based on complex criteria. Conclusions Rice-Map delivers abundant up-to-date japonica and indica annotations, providing a valuable resource for both computational and bench biologists. Rice-Map is publicly accessible at http://www.ricemap.org/, with all data available for free downloading.
Published: 2011

32. Integration and visualization of systems biology data in context of the genome

Author: Dan Tenenbaum, Tie Koide, Nitin S. Baliga, J. Christopher Bare, and David J Reiss
Subjects: Halobacterium salinarum, Systems biology, Genomics, Context (language use), Genome browser, Computational biology, Biology, lcsh:Computer applications to medicine. Medical informatics, Biochemistry, Gaggle, Genome, World Wide Web, 03 medical and health sciences, Genome, Archaeal, Structural Biology, lcsh:QH301-705.5, Molecular Biology, 030304 developmental biology, 0303 health sciences, Biological data, Gene Expression Profiling, Systems Biology, Applied Mathematics, 030302 biochemistry & molecular biology, Computer Science Applications, ComputingMethodologies_PATTERNRECOGNITION, Gene Expression Regulation, lcsh:Biology (General), Biological data visualization, Bacillus anthracis, lcsh:R858-859.7, Software
Abstract: Background High-density tiling arrays and new sequencing technologies are generating rapidly increasing volumes of transcriptome and protein-DNA interaction data. Visualization and exploration of this data is critical to understanding the regulatory logic encoded in the genome by which the cell dynamically affects its physiology and interacts with its environment. Results The Gaggle Genome Browser is a cross-platform desktop program for interactively visualizing high-throughput data in the context of the genome. Important features include dynamic panning and zooming, keyword search and open interoperability through the Gaggle framework. Users may bookmark locations on the genome with descriptive annotations and share these bookmarks with other users. The program handles large sets of user-generated data using an in-process database and leverages the facilities of SQL and the R environment for importing and manipulating data. A key aspect of the Gaggle Genome Browser is interoperability. By connecting to the Gaggle framework, the genome browser joins a suite of interconnected bioinformatics tools for analysis and visualization with connectivity to major public repositories of sequences, interactions and pathways. To this flexible environment for exploring and combining data, the Gaggle Genome Browser adds the ability to visualize diverse types of data in relation to its coordinates on the genome. Conclusions Genomic coordinates function as a common key by which disparate biological data types can be related to one another. In the Gaggle Genome Browser, heterogeneous data are joined by their location on the genome to create information-rich visualizations yielding insight into genome organization, transcription and its regulation and, ultimately, a better understanding of the mechanisms that enable the cell to dynamically respond to its environment.
Published: 2010

33. Ensembl variation resources

Author: Bethan Pritchard, Daniel Rios, Paul Flicek, Yuan Chen, Ewan Birney, Eugene Kulesha, Pablo Marin-Garcia, Damian Smedley, Simon Brent, Giulietta Spudich, James Smith, William M. McLaren, and Fiona Cunningham
Subjects: lcsh:QH426-470, Genotype, lcsh:Biotechnology, Population, Context (language use), Genomics, Genome browser, Biology, Polymorphism, Single Nucleotide, Linkage Disequilibrium, Database, Mice, User-Computer Interface, 03 medical and health sciences, 0302 clinical medicine, lcsh:TP248.13-248.65, Server, Databases, Genetic, Genetics, Animals, Humans, Ensembl, education, Phylogeny, 030304 developmental biology, Comparative genomics, Internet, 0303 health sciences, education.field_of_study, Base Sequence, Genetic Variation, Sequence Analysis, DNA, Gene Annotation, Data science, Rats, lcsh:Genetics, ComputingMethodologies_PATTERNRECOGNITION, Phenotype, 030220 oncology & carcinogenesis, Cattle, Algorithms, Biotechnology
Abstract: Background The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. Description The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. Conclusions Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.
Published: 2010

34. A new measurement of sequence conservation

Author: Xiaohui Cai, Haiyan Hu, and Xiaoman Li
Subjects: Chromatin Immunoprecipitation, lcsh:QH426-470, Sequence analysis, lcsh:Biotechnology, Sequence alignment, Genome browser, Biology, Genome, Conserved sequence, Mice, 03 medical and health sciences, 0302 clinical medicine, lcsh:TP248.13-248.65, Research article, Genetics, Animals, Humans, Conserved Sequence, Oligonucleotide Array Sequence Analysis, 030304 developmental biology, Sequence (medicine), Smith–Waterman algorithm, Comparative Genomic Hybridization, 0303 health sciences, Computational Biology, Sequence Analysis, DNA, lcsh:Genetics, Evolutionary biology, DNA microarray, Databases, Nucleic Acid, Sequence Alignment, 030217 neurology & neurosurgery, Biotechnology
Abstract: Background Understanding sequence conservation is important for the study of sequence evolution and for the identification of functional regions of the genome. Current studies often measure sequence conservation based on every position in contiguous regions. Therefore, a large number of functional regions that contain conserved segments separated by relatively long divergent segments are ignored. Our goal in this paper is to define a new measurement of sequence conservation such that both contiguously conserved regions and discontiguously conserved regions can be detected based on this new measurement. Here and in the following, conserved regions are those regions that share similarity higher than a pre-specified similarity threshold with their homologous regions in other species. That is, conserved regions are good candidates of functional regions and may not be always functional. Moreover, conserved regions may contain long and divergent segments. Results To identify both discontiguously and contiguously conserved regions, we proposed a new measurement of sequence conservation, which measures sequence similarity based only on the conserved segments within the regions. By defining conserved segments using the local alignment tool CHAOS, under the new measurement, we analyzed the conservation of 1642 experimentally verified human functional non-coding regions in the mouse genome. We found that the conservation in at least 11% of these functional regions could be missed by the current conservation analysis methods. We also found that 72% of the mouse homologous regions identified based on the new measurement are more similar to the human functional sequences than the aligned mouse sequences from the UCSC genome browser. We further compared BLAST and discontiguous MegaBLAST with our method. We found that our method picks up many more conserved segments than BLAST and discontiguous MegaBLAST in these regions. Conclusions It is critical to have a new measurement of sequence conservation that is based only on the conserved segments in one region. Such a new measurement can aid the identification of better local "orthologous" regions. It will also shed light on the identification of new types of conserved functional regions in vertebrate genomes [1].
Published: 2009

35. Online Training of New Curators

Author: Marc Gillespie and Bijay Jassal
Subjects: World Wide Web, Resource (project management), Literature citation, Community engagement, Data model, Bioinformatics, Computer science, Trainer, Ensembl, General Materials Science, Genome browser, UniProt
Abstract: The basic information in Reactome is provided by bench biologists who are experts on a particular pathway, the Reactome Team is always working hard to drive engagement. This engagement between experts, curators, editors and reviewers requires maintenance and improvement, and in this sense Reactome is itself a model for large biocuration projects that are driven by community engagement. This tutorial will highlight issues from the perspective of online training participants, the trainer's and the audience's. From the audience perspective the tutorial will introduce the concepts that drive the Reactome data model, cover the basic steps that a researcher would have to follow in order to breakdown a biological pathway into its "reaction-based" Reactome representation. Introduce the user to the tools that are used by authors, the "authortool" and the tools used by curators, the "curatortool" to move that data into the Reactome database. From the trainer perspective the tutorial will focus on the essential role that a clear explanation of a resource's data model plays in priming the audience for the technical aspects of biocuration. Technical challenges and online delivery methods will be discussed and examples of systems used will be presented with discussion of the negative and positive aspects. Pedagogical models for enhancing audience participation will be briefly presented. The Reactome project is a collaboration among Cold Spring Harbor Laboratory, The European Bioinformatics Institute, and The Gene Ontology Consortium to develop a curated resource of core pathways and reactions in human biology. The information in this database is authored by biological researchers with expertise in their fields, maintained by the Reactome editorial staff, and cross referenced with the sequence databases at NCBI, Ensembl and UniProt, the UCSC Genome Browser , KEGG (Gene and Compound ), ChEBI, PubMed and GO. The information is then managed by groups of curators at CSHL and EBI, peer-reviewed by other researchers and published on the web. While Reactome is targeted at human pathways, it also includes many individual biochemical reactions from non-human systems such as rat, mouse, pufferfish and zebrafish. This makes the database relevant to the many researchers who work on model organisms. All the information in Reactome is backed up by its provenance: either a literature citation or an electronic inference based on sequence similarity. Reactome is a free on-line resource, and Reactome software is open-source.
Published: 2009

36. Curation at the NCBI: Genomes, Genes, & Sequence Standards

Author: Kim D. Pruitt, Garth Brown, Donna Maglott, Janet Weber, Terence Murphy, Melissa J. Landrum, Wendy Wu, Bhanu Rajput, Lillian D. Riddick, David Webb, Michael P. Murphy, Catherine M. Farrell, Bonnie L. Maidak, and Jennifer Hart
Subjects: Entrez Gene, RefSeq, Ensembl, General Materials Science, Human genome, Genome browser, Computational biology, Genome project, Biology, Genome Reference Consortium, Genome
Abstract: The National Center for Biotechnology Information (NCBI) provides curation support for many genomes, and disseminates information in several resources including Entrez Gene, reference sequences (RefSeq), the Consensus CDS (CCDS) database, and the Genome Reference Consortium (GRC). These projects are supported by several collaborations to provide:1) support to the international consortium maintaining the assemblies for human and mouse (GRC); 2) sequence standards for chromosomes, genes, transcripts and proteins (RefSeq); 3) reports of integrated information including nomenclature, publications, phenotypes and diseases, sequences, ontologies, interactions (Gene); and 4) identification of proteins that are consistently annotated on the human and mouse reference genomes, and consistently updated by collaborating members (CCDS). NCBI curation of any one data type (e.g., a gene) is closely integrated with evaluation of the genome assembly, and determining annotation by way of RefSeq transcript and protein sequences. Database and work-flow infrastructure is designed to support reporting and tracking issues with the assembly, gene, or evidence data to collaborating groups, and to support collaborative review and discussions of issues that arise. Curation depends on publicly available information to represent the gene extent, alternatively spliced transcripts, and protein isoforms. Scientific consults occur regularly and wet-bench validation needs are supported by some of the collaborations. Curation of genome annotation results in improved data presentation at the three major genome browser sites (Ensembl, NCBI, UCSC) and has resulted in efforts to define common curation guidelines to maximize consistency and minimize conflicts.The presentation focuses on curation of the human genome, genes, and RefSeq sequence standards.
Published: 2009

37. GenColors: Annotation and comparative genomics made easy

Author: Alessandro Romualdi, Juergen Suehnel, Marius Felder, Matthias Platzer, and Gernot Gloeckner
Subjects: Comparative genomics, Annotation, ComputingMethodologies_PATTERNRECOGNITION, Computer science, GenBank, General Materials Science, Computational biology, Bacterial genome size, Genome browser, Genome project, Bioinformatics, Genome, Genome comparison
Abstract: GenColors is a web-based software/database system initially aimed at an improved and accelerated annotation of prokaryotic genomes making extensive use of genome comparison (Romualdi et al., Bioinformatics 2005; Romualdi et al., Methods Mol. Biol. 2007). It offers a seamless integration of data from ongoing sequencing projects and annotated genomic sequences obtained from GenBank. With GenColors dedicated genome browsers containing a group of related genomes can be easily set up and maintained. The tool has been efficiently used for sequenceing and annotating the Borrelia garinii genome and is currently applied to a number of other ongoing genome projects on Legionella, Pseudomonas and E. coli genomes. Examples for freely accessible GenColors-based dedicated genome browsers are the Spirochetes Genome Browser SGB ("sgb.fli-leibniz.de":http://sgb.fli-leibniz.de), the Photogenome Browser CGB ("cgb.fli-leibniz.de":http://cgb.fli-leibniz.de) and the Enterobacter Genome Browser ENGENE ("engene.fli-leibniz.de":http://engene.fli-leibniz.de). The system has now been adapted to handle also eukaryotic genomes. A first application of this feature is the annotation and analysis of two fungal species (unpublished). Another GenColors-based tool is the Jena Prokaryotic Genome Viewer - JPGV ("jpgv.fli-leibniz.de":http://jpgv.fli-leibniz.de). Contrary to the dedicated browsers it offers information on almost all finished bacterial genomes. Currently, it includes 1140 genomic elements of 293 species.
Published: 2009

38. easyExon – A Java-based GUI tool for processing and visualization of Affymetrix exon array data

Author: Chi Hung Lin, Yin-Yi Li, Ming Ta Hsu, Tsun-Po Yang, Hsei-Wei Wang, Chih-Hung Jen, and Ting-Yu Chang
Subjects: Information Storage and Retrieval, Genome browser, Computational biology, Biology, lcsh:Computer applications to medicine. Medical informatics, Biochemistry, Mice, User-Computer Interface, Exon, Structural Biology, Animals, Humans, lcsh:QH301-705.5, Molecular Biology, Oligonucleotide Array Sequence Analysis, Genetics, Event (computing), Gene Expression Profiling, Applied Mathematics, Alternative splicing, Exons, Pipeline (software), Automatic summarization, Rats, Computer Science Applications, Alternative Splicing, lcsh:Biology (General), lcsh:R858-859.7, DNA microarray, Affymetrix GeneChip Operating Software, Software
Abstract: Background Alternative RNA splicing greatly increases proteome diversity and thereby contribute to species- or tissue-specific functions. The possibility to study alternative splicing (AS) events on a genomic scale using splicing-sensitive microarrays, including the Affymetrix GeneChip Exon 1.0 ST microarray (exon array), has appeared very recently. However, the application of this new technology is hindered by the lack of free and user-friendly software devoted to these novel platforms. Results In this study we present a Java-based freeware, easyExon http://microarray.ym.edu.tw/easyexon, to process, filtrate and visualize exon array data with an analysis pipeline. This tool implements the most commonly used probeset summarization methods as well as AS-orientated filtration algorithms, e.g. MIDAS and PAC, for the detection of alternative splicing events. We include a biological filtration function according to GO terms, and provide a module to visualize and interpret the selected exons and transcripts. Furthermore, easyExon can integrate with other related programs, such as Integrate Genome Browser (IGB) and Affymetrix Power Tools (APT), to make the whole analysis more comprehensive. We applied easyExon on a public accessible colon cancer dataset as an example to illustrate the analysis pipeline of this tool. Conclusion EasyExon can efficiently process and analyze the Affymetrix exon array data. The simplicity, flexibility and brevity of easyExon make it a valuable tool for AS event identification in genomic research.
Published: 2008

39. SNPs in Multi-Species Conserved Sequences (MCS) as useful markers in association studies: a practical approach

Author: Jorge R. Oksenberg, Jacob L. McCauley, Douglas P. Mortlock, Shannon J. Kenealy, Simon G. Gregory, Margaret A. Pericak-Vance, Stephen L. Hauser, Jonathan L. Haines, Elliott H. Margulies, and Nathalie Schnetz-Boutaud
Subjects: Genetic Markers, Candidate gene, Multiple Sclerosis, lcsh:QH426-470, Genotype, Genetic Linkage, lcsh:Biotechnology, Single-nucleotide polymorphism, Computational biology, Genome browser, Biology, Polymorphism, Single Nucleotide, Linkage Disequilibrium, Conserved sequence, lcsh:TP248.13-248.65, Genetics, Animals, Humans, Genetic Predisposition to Disease, Gene, Genetic association, Genome, Human, Methodology Article, lcsh:Genetics, Chromosomes, Human, Pair 1, Human genome, DNA microarray, Biotechnology
Abstract: Background Although genes play a key role in many complex diseases, the specific genes involved in most complex diseases remain largely unidentified. Their discovery will hinge on the identification of key sequence variants that are conclusively associated with disease. While much attention has been focused on variants in protein-coding DNA, variants in noncoding regions may also play many important roles in complex disease by altering gene regulation. Since the vast majority of noncoding genomic sequence is of unknown function, this increases the challenge of identifying "functional" variants that cause disease. However, evolutionary conservation can be used as a guide to indicate regions of noncoding or coding DNA that are likely to have biological function, and thus may be more likely to harbor SNP variants with functional consequences. To help bias marker selection in favor of such variants, we devised a process that prioritizes annotated SNPs for genotyping studies based on their location within Multi-species Conserved Sequences (MCSs) and used this process to select SNPs in a region of linkage to a complex disease. This allowed us to evaluate the utility of the chosen SNPs for further association studies. Previously, a region of chromosome 1q43 was linked to Multiple Sclerosis (MS) in a genome-wide screen. We chose annotated SNPs in the region based on location within MCSs (termed MCS-SNPs). We then obtained genotypes for 478 MCS-SNPs in 989 individuals from MS families. Results Analysis of our MCS-SNP genotypes from the 1q43 region and comparison to HapMap data confirmed that annotated SNPs in MCS regions are frequently polymorphic and show subtle signatures of selective pressure, consistent with previous reports of genome-wide variation in conserved regions. We also present an online tool that allows MCS data to be directly exported to the UCSC genome browser so that MCS-SNPs can be easily identified within genomic regions of interest. Conclusion Our results showed that MCS can easily be used to prioritize markers for follow-up and candidate gene association studies. We believe that this novel approach demonstrates a paradigm for expediting the search for genes contributing to complex diseases.
Published: 2007

40. Erratum to: Inference of miRNA targets using evolutionary conservation and pathway analysis

Author: Mihaela Zavolan, Jean Hausser, Dimos Gaidatzis, and Erik van Nimwegen
Subjects: Applied Mathematics, Inference, Regret, Computational biology, Genome browser, Biology, lcsh:Computer applications to medicine. Medical informatics, Pathway analysis, Biochemistry, Genome, humanities, Computer Science Applications, Conserved sequence, lcsh:Biology (General), Structural Biology, microRNA, lcsh:R858-859.7, DNA microarray, lcsh:QH301-705.5, Molecular Biology
Abstract: In our manuscript on miRNA target predictions [1] we made use of a number of genome alignments that we obtained from the UCSC genome browser. We regret that, due to a misunderstanding, we failed to explicitly acknowledge the sequencing centers that made the genome sequences, that were used to construct these alignments, available before publication.
Published: 2007

41. CASCAD: a database of annotated candidate single nucleotide polymorphisms associated with expressed sequences

Author: Eugene Berezikov, Edwin Cuppen, Victor Guryev, and Hubrecht Institute for Developmental Biology and Stem Cell Research
Subjects: dbSNP, lcsh:QH426-470, Sequence analysis, lcsh:Biotechnology, Information Storage and Retrieval, Sequence assembly, Genomics, Computational biology, Genome browser, Biology, Polymorphism, Single Nucleotide, DNA sequencing, Database, lcsh:TP248.13-248.65, Databases, Genetic, Genetics, Animals, Ensembl, Zebrafish, Expressed Sequence Tags, Internet, Expressed sequence tag, Polymorphism, Genetic, Computational Biology, Sequence Analysis, DNA, Rats, lcsh:Genetics, ComputingMethodologies_PATTERNRECOGNITION, Databases, Nucleic Acid, Sequence Alignment, Software, Biotechnology
Abstract: Background With the recent progress made in large-scale genome sequencing projects a vast amount of novel data is becoming available. A comparative sequence analysis, exploiting sequence information from various resources, can be used to uncover hidden information, such as genetic variation. Although there are enormous amounts of SNPs for a wide variety of organisms submitted to NCBI dbSNP and annotated in most genome assembly viewers like Ensembl and the UCSC Genome Browser, these platforms do not easily allow for extensive annotation and incorporation of experimental data supporting the polymorphism. However, such information is very important for selecting the most promising and useful candidate polymorphisms for use in experimental setups. Description The CASCAD database is designed for presentation and query of candidate SNPs that are retrieved by in silico mining of high-throughput sequencing data. Currently, the database provides collections of laboratory rat (Rattus norvegicus) and zebrafish (Danio rerio) candidate SNPs. The database stores detailed information about raw data supporting the candidate, extensive annotation and links to external databases (e.g. GenBank, Ensembl, UniGene, and LocusLink), verification information, and predictions of a potential effect for non-synonymous polymorphisms in coding regions. The CASCAD website allows search based on an arbitrary combination of 27 different parameters related to characteristics like candidate SNP quality, genomic localization, and sequence data source or strain. In addition, the database can be queried with any custom nucleotide sequences of interest. The interface is crosslinked to other public databases and tightly coupled with primer design and local genome assembly interfaces in order to facilitate experimental verification of candidates. Conclusions The CASCAD database discloses detailed information on rat and zebrafish candidate SNPs, including the raw data underlying its discovery. An advanced web-based search interface http://cascad.niob.knaw.nl allows universal access to the database content and allows various queries supporting many types of research utilizing single nucleotide polymorphisms.
Published: 2005

42. Comments on the AZFc markers used for screening of Yq microdeletions

Author: Hamid Reza Khorram Khorshid and Kioomars Saliminejad
Subjects: Male, Genetics, Azoospermia, Chromosomes, Human, Y, Y chromosome microdeletion, Obstetrics and Gynecology, Oligospermia, General Medicine, Genome browser, Biology, Y chromosome, medicine.disease, Human genetics, Reproductive Medicine, medicine, Humans, Chromosome Deletion, Gene, Sex Chromosome Aberrations, Genetics (clinical), Developmental Biology, Reference genome
Abstract: To the Editor: We have read with interest the article by Balkan and colleagues [1] in your November issue that screened Y chromosome microdeletions in infertile males with oligozoospermia and azoospermia in Southeast Turkey. We would like to comment on one false positive result and discuss some of the markers that they used for screening of microdeletions in AZFc region. We believe that it could be useful for the Journal of Assisted Reproduction and Genetics readers. First, they described one azoospermic patient (Md-58) that was negative for the two markers sY277 and sY153, while positive for sY152. They stated that sY277 is inside of the DAZ3 gene. We have defined the relative location of sY277, sY254, sY255, sY153 and sY152 using primer blast program available at the University of California, Santa Cruz (UCSC) Genome Browser on the MSY (male-specific region of the human Y chromosome) reference sequence [5]. The marker sY277 is located within the DAZ genes and there is at least one copy inside of each DAZ gene. Negative results for the sY277 indicate a large deletion which includes all copies of the DAZ genes (Fig. 1). Consequently, given the relative position of these markers as clearly depicted by Fig. 1, deletion of sY277 without deletion of sY152 is impossible and should be regarded as a methodological error. Fig. 1 Schematic diagram shows relative position of the markers sY152, sY153, sY254, sY255 and sY277. The figure clearly shows that sY152 is located in the deleted region. The highlighted markers are not recommended by the European Academy of Andrology (EAA) ... Second, they surprisingly stated that the two markers sY254 and sY255 are inside of the DAZ1and DAZ2 genes, respectively. Four DAZ genes are 99.9% identical and there is at least one copy of each marker inside of each DAZ gene (Fig. 1) [3] Third, the three markers sY145, sY152 and sY153 which they referred to AZFd, are actually inside of AZFc region ad there are at least two copies of each marker in this region. Nowadays, it is accepted that putative AZFd region does not exist [2, 4]. Finally, to avoid these technical flaws there is validated guideline which could detect up to 95% of all AZF microdeletions [4].
Published: 2012

43. Genome-wide Mycobacterium tuberculosis variation (GMTV) database: a new tool for integrating sequence variations and epidemiology

Author: Mikhail Rotkevich, Yulia D Isaeva, M Shul'gina, Peter K Yablonsky, Vyacheslav Y. Zhuravlev, Dmitry S. Ischenko, Serguei Simonov, Anna Vyazovaya, Alla Lapidus, Olga Narvskaya, Egor Shitikov, Elena S. Kostryukova, Irina Y. Karpova, Elena N. Ilina, Vadim M. Govorun, Ekaterina Chernyaeva, Pavel Dobrynin, Stephen J. O'Brien, Olga Manicheva, Elena Nosova, and Igor Mokrousov
Subjects: Mycobacterium Tuberculosis Structural Genomics Consortium, Tuberculosis, Genome browser, Drug resistance, computer.software_genre, Genome, Genetic diversity, Database, Mycobacterium tuberculosis, Databases, Genetic, Genome variations, Genetics, medicine, Humans, Indel, Whole genome sequencing, biology, Genetic Variation, biology.organism_classification, medicine.disease, Mutation, computer, Genome, Bacterial, Biotechnology
Abstract: Tuberculosis (TB) poses a worldwide threat due to advancing multidrug-resistant strains and deadly co-infections with Human immunodeficiency virus. Today large amounts of Mycobacterium tuberculosis whole genome sequencing data are being assessed broadly and yet there exists no comprehensive online resource that connects M. tuberculosis genome variants with geographic origin, with drug resistance or with clinical outcome. Here we describe a broadly inclusive unifying Genome-wide Mycobacterium tuberculosis Variation (GMTV) database, ( http://mtb.dobzhanskycenter.org ) that catalogues genome variations of M. tuberculosis strains collected across Russia. GMTV contains a broad spectrum of data derived from different sources and related to M. tuberculosis molecular biology, epidemiology, TB clinical outcome, year and place of isolation, drug resistance profiles and displays the variants across the genome using a dedicated genome browser. GMTV database, which includes 1084 genomes and over 69,000 SNP or Indel variants, can be queried about M. tuberculosis genome variation and putative associations with drug resistance, geographical origin, and clinical stages and outcomes. Implementation of GMTV tracks the pattern of changes of M. tuberculosis strains in different geographical areas, facilitates disease gene discoveries associated with drug resistance or different clinical sequelae, and automates comparative genomic analyses among M. tuberculosis strains.
Published: 2014

44. My5C: web tools for chromosome conformation capture studies

Author: Nynke L. van Berkum, Bryan R. Lajoie, Amartya Sanyal, and Job Dekker
Subjects: Genetics, Internet, Information retrieval, Chromosome Mapping, Sequence Analysis, DNA, Cell Biology, Genome browser, Biology, Biochemistry, Cursor (databases), Genome, Article, Field (computer science), Chromosome conformation capture, Set (abstract data type), User-Computer Interface, Upload, Sliding window protocol, Nucleic Acid Conformation, Molecular Biology, Algorithms, Software, DNA Primers, Biotechnology
Abstract: The three-dimensional arrangement of chromosomes is critical for genome regulation and is the topic of intense research. Chromosome organization can be studied using Chromosome Conformation Capture (3C) - based assays1,2. 5C (“3C-Carbon Copy”) is an adaptation of 3C for high-throughput analysis of interaction networks and three-dimensional folding of chromosomes3. 5C combines 3C with multiplexed ligation-mediated amplification with pools of primers to detect millions of chromatin interactions in parallel (Fig. 1). The design of large numbers of 5C primers and the handling of large chromatin interaction maps can be daunting. To enable the community to adopt 5C we developed “my5C”, a publicly available set of webtools for all aspects of 5C studies. My5C is hosted at http://my5C.umassmed.edu. Here we highlight the main features of my5C (see also Supplemental Data 1–4). Figure 1 Overview of my5C. (a) Top: 5C technology and locations of forward and reverse primers. Bottom: 5C ligation product. (b) 5C primer design output of my5C.primers showing an alternating design scheme. The triangles indicate restriction fragments. The height ... My5C is composed of two modules. The “my5C.primers” module is used to design 5C primers for restriction fragments throughout user-defined genomic regions (Supplemental Data 5). For analysis of overall three-dimensional conformation alternating primer design schemes3 (Fig. 1b) can be selected (Supplemental Data 6). For studies of networks of interactions between specific genomic elements, e.g. promoters and enhancers3, users can upload Datas containing genomic coordinates of these elements and my5C.primers will design forward and reverse primers for the two sets (Supplemental Data 7). Primer designs can be downloaded along with a custom microarray probe set for detection of all interactions that the primers interrogate. In the second module, “my5C.heatmap”, datasets are visualized as two-dimensional heatmaps (Supplemental Data 8–13). Each datapoint corresponds to an interaction frequency between two loci (Fig. 1c). To facilitate exploration of large interaction maps the heatmaps are interactive: when moving the cursor over the heatmap detailed information is provided regarding the interaction at the cursor position. Clicking an interaction will display interaction proDatas across the dataset for each of the interacting loci. My5C.heatmap provides a variety of tools to analyze interaction data. Users can display the difference, ratio or log ratio of two datasets. My5C.heatmap also enables users to identify elements that interact more frequently than expected, which may point to specific looping associations. To identify larger patterns, users can smooth data or perform sliding window analysis (Fig. 1c). My5C.heatmap enables integrating chromosome conformation data with other genomic features. When a position on the heatmap is clicked, links to the UCSC genome browser appear that lead to the corresponding positions in the genome to explore other annotations. Further, lists of genomic annotations, e.g. promoters, can be uploaded and My5C.heatmap will highlight interaction data obtained for these loci in the heatmap. Users can download any data displayed as a heatmap as tables or as lists of pairwise interactions that can be uploaded into Cytoscape for network visualization4. Data can be downloaded in UCSC BED format to display data as custom tracks in the UCSC genome browser (Fig. 1d, Supplemental Data 11). This allows users to integrate interaction data with publicly available genome annotations in the genome browser. To ensure confidentiality all data are password protected and users can opt not to store data on the my5C server. My5C provides a critical resource to the emerging field of chromosome conformational studies.
Published: 2009

45. KONAGAbase: a genomic and transcriptomic database for the diamondback moth, Plutella xylostella

Author: Kazuhisa Miyamoto, Kimiko Yamamoto, Hiroyuki Kanamori, Junko Narukawa, Yuichi Katayose, Kanako Kurita, Seigo Kuwazaki, Masahiro Urio, Yoshitaka Suetsugu, Hiroaki Noda, Takashi Matsumoto, and Akiya Jouraku
Subjects: Molecular Sequence Data, Insect pest, Genomics, Genome browser, Moths, Biology, computer.software_genre, Database, User-Computer Interface, Annotation, KONAGAbase, Putative gene, Representative sequences, Databases, Genetic, Computer Graphics, Genetics, Animals, Plutella xylostella, Internet, Diamondback moth, Gene Expression Profiling, dBm, Genomic and transcriptomic database, biology.organism_classification, Organ Specificity, DNA microarray, computer, Biotechnology
Abstract: The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests for crucifer crops worldwide. DBM has rapidly evolved high resistance to most conventional insecticides such as pyrethroids, organophosphates, fipronil, spinosad, Bacillus thuringiensis, and diamides. Therefore, it is important to develop genomic and transcriptomic DBM resources for analysis of genes related to insecticide resistance, both to clarify the mechanism of resistance of DBM and to facilitate the development of insecticides with a novel mode of action for more effective and environmentally less harmful insecticide rotation. To contribute to this goal, we developed KONAGAbase, a genomic and transcriptomic database for DBM (KONAGA is the Japanese word for DBM). KONAGAbase provides (1) transcriptomic sequences of 37,340 ESTs/mRNAs and 147,370 RNA-seq contigs which were clustered and assembled into 84,570 unigenes (30,695 contigs, 50,548 pseudo singletons, and 3,327 singletons); and (2) genomic sequences of 88,530 WGS contigs with 246,244 degenerate contigs and 106,455 singletons from which 6,310 de novo identified repeat sequences and 34,890 predicted gene-coding sequences were extracted. The unigenes and predicted gene-coding sequences were clustered and 32,800 representative sequences were extracted as a comprehensive putative gene set. These sequences were annotated with BLAST descriptions, Gene Ontology (GO) terms, and Pfam descriptions, respectively. KONAGAbase contains rich graphical user interface (GUI)-based web interfaces for easy and efficient searching, browsing, and downloading sequences and annotation data. Five useful search interfaces consisting of BLAST search, keyword search, BLAST result-based search, GO tree-based search, and genome browser are provided. KONAGAbase is publicly available from our website ( http://dbm.dna.affrc.go.jp/px/ ) through standard web browsers. KONAGAbase provides DBM comprehensive transcriptomic and draft genomic sequences with useful annotation information with easy-to-use web interfaces, which helps researchers to efficiently search for target sequences such as insect resistance-related genes. KONAGAbase will be continuously updated and additional genomic/transcriptomic resources and analysis tools will be provided for further efficient analysis of the mechanism of insecticide resistance and the development of effective insecticides with a novel mode of action for DBM.
Published: 2013

46. A public gene trap resource for mouse functional genomics

Author: Bruce R. Conklin, William L. Stanford, Patricia Ruiz, Janet Rossant, Alexander Nord, Stephen G. Young, Geoff Hicks, Wolfgang Wurst, William C. Skarnes, Tony Cox, Marc Tessier-Lavigne, Phil Soriano, and Harald von Melchner
Subjects: Genetics, Gene targeting, Genomics, Genome browser, Biology, Genome, Article, Cell Line, Mice, Mutagenesis, Insertional, Gene trapping, GenBank, Databases, Genetic, Animals, Ensembl, Functional genomics, Gene
Abstract: To the editor: Gene trapping is a high-throughput approach that can be used to introduce insertional mutations across the genome in mouse embryonic stem (ES) cells. Gene trap vectors simultaneously mutate and report the expression of the endogenous gene at the site of insertion and provide a DNA tag for the rapid identification of the disrupted gene. The generation of mutant mice from a large collection of ES cell lines carrying gene trap insertions could be applied to large-scale functional analysis of the ~30,000 mammalian genes. The overall impact of gene trap resources will rest on the fraction of the genome that is accessible with this technology, the efficiency relative to other competing technologies and the availability of such a resource to the academic community. Lexicon Genetics, a US-based biotechnology company, was the first to implement a genome-wide gene trapping program1 and has developed OmniBank (http://www.lexicon-genetics.com), the largest library of mutant ES cell lines. A parallel effort was initiated in the public domain by several academic groups in the International Gene Trap Consortium (IGTC; http://www.igtc.ca). The recent release of OmniBank sequence tags to GenBank2 has made it possible to compare the size and efficiency of the existing gene trap libraries. We confirm that Lexicon achieved close to 60% coverage of the genome from 200,000 OmniBank sequence tags deposited in GenBank (Fig. 1). Our analysis, supported independently by Lexicon3, indicates that the rate of trapping new genes was not linear but declined within the first 100,000 tags to a rate at which 1 new gene was added every 35 tags, comparable to the efficiency of high-throughput gene targeting methods4. To date, the IGTC has attained 32% genome coverage in 27,000 tags; trapping is likewise nonlinear, but the initial rate seems to be somewhat faster than Lexicon’s (Fig. 1). The seemingly higher efficiency may relate to the diversity of plasmid and retroviral vector designs used by the IGTC that could help overcome insertion site preferences of any single vector5; further studies are needed to fully understand how vector design and other experimental factors influence the efficiency of gene trapping. One-fifth of the genes trapped by the IGTC were not represented in the sequence tags released by Lexicon (Supplementary Tables 1-3 online). Thus, the two efforts together have trapped nearly two-thirds of all genes in mice. We conclude that gene trapping is an effective strategy to mutate a substantial fraction of the genes in mice that compares favorably with gene-targeting approaches. Furthermore, we continue to refine the technology, particularly in developing strategies for postinsertional modification of the trapped loci to create a wide range of desired alleles. The IGTC will provide an important public resource of new mutations in mice that will accelerate the pace of functional annotation of the mammalian genome. Figure 1 Comparison of the rates of trapping of the IGTC and OmniBank resources. Unique Ensembl genes were identified using MAPTAG (http://www.sanger.ac.uk/Software/MAPTAG), an automated annotation program that identifies short, almost perfect matches to gene ... Gene trap cell lines generated by the IGTC are available without restriction (http://baygenomics.ucsf.edu; http://www.genetrap.de; http://www.escells.ca; http://www.sanger.ac.uk/genetrap; http://www.fhcrc.org/labs/soriano/GTdb; http://www.cmhd.ca) and all sequence tags are mapped on the Ensembl mouse genome browser http://www.ensembl.org/Mus_musculus/; select DAS Source ‘GeneTrap’).
Published: 2004

47. COSMIC: the catalogue of somatic mutations in cancer

Author: Peter J. Campbell, Michael R. Stratton, Charlotte G. Cole, Nidhi Bindal, Andrew Futreal, Kenric Leung, Simon A. Forbes, Jon W. Teague, Sari Ward, David Beare, Sally Bamford, Prasad Gunasekaran, Chai Yin Kok, and Mingming Jia
Subjects: 0106 biological sciences, Genetics, 0303 health sciences, COSMIC cancer database, Genomics, Genome browser, Computational biology, Gene mutation, Biology, 01 natural sciences, Genome, 3. Good health, 03 medical and health sciences, Cancer Genome Project, Poster Presentation, Ensembl, Exome, 030304 developmental biology, 010606 plant biology & botany
Abstract: The Catalogue Of Somatic Mutations In Cancer (COSMIC) [1] is one of the largest repositories of information on somatic mutations in human cancer. The project has been running for more than ten years as part of the Cancer Genome Project (CGP) at the Wellcome Trust Sanger Institute in the UK. The data in COSMIC are curated from a variety of sources, primarily the scientific literature and large international consortia. The project includes information from the CGP, along with data from other consortia such as the International Cancer Genome Consortium and The Cancer Genome Atlas. In addition, COSMIC is regularly updated with the genes highlighted in the Cancer Gene Census, which curates the scientific literature for known cancer genes [2]. With the advent of whole exome and genome sequencing technology, the amount of data in COSMIC is increasing rapidly. The recent COSMIC release (version 53; 18 May 2011) contains 608,042 tumor and cell line samples, annotating 176,856 mutations across 19,439 genes, with 352 full exomes, 43 whole genome rearrangement screens and 4 full genomes now available. The data are updated regularly, with new releases scheduled every two months. COSMIC provides a large number of graphical and tabular views for interpreting and mining the large quantity of information, as well as the facility to export the relevant data in various formats. The website can be navigated in many ways to examine mutation patterns on the basis of genes, samples and phenotypes, which are the main entry points to COSMIC. COSMIC also provides various options to browse the data in a genomic context. Integration with the Ensembl genome browser allows the visualization of full genome annotations, together with COSMIC data, on the GRCh37 genome coordinates. COSMIC also contains its own genome browser, which facilitates data analysis by combining genome-wide gene structures and sequences with rearrangement breakpoints, copy number variations and all somatic substitutions, deletions, insertions and complex gene mutations. The main COSMIC website [1] encompasses all of the available data. However, within COSMIC, the Cancer Cell Line Project [3] is a specialized component, which provides details of the genotyping of almost 800 commonly used cancer cell lines, through the set of known cancer genes. Its focus is to identify driver mutations, or those likely to be implicated in the oncogenesis of each tumor. This information forms the basis for integrating COSMIC with the Genomics of Drug Sensitivity in Cancer project [4], which is a joint effort with the Massachusetts General Hospital [5] to screen this panel of cancer cell lines against potential anticancer therapeutic compounds to investigate correlations between somatic mutations and drug sensitivity. Data on somatic mutations in cancer are being produced at a rapidly increasing rate, and the combined analysis of large distributed datasets is becoming ever more difficult. However, COSMIC curates and standardizes this information in a single database, providing user-friendly browsing tools and analytical functions, thus ensuring its role as a key resource in human cancer genetics.
Published: 2011

48. Evolution of gene regulation of pluripotency - the case for wiki tracks at genome browsers

Author: Georg Fuellen and Stephan Struckmann
Subjects: Pluripotent Stem Cells, Immunology, Gene regulatory network, Information Storage and Retrieval, Library science, Genomics, Context (language use), Genome browser, Biology, Genome, General Biochemistry, Genetics and Molecular Biology, Evolution, Molecular, Annotation, Databases, Genetic, Animals, Deep homology, lcsh:QH301-705.5, Ecology, Evolution, Behavior and Systematics, Genetics, Internet, Binding Sites, Integrative bioinformatics, Agricultural and Biological Sciences(all), Biochemistry, Genetics and Molecular Biology(all), Research, Applied Mathematics, Gene Expression Regulation, Developmental, lcsh:Biology (General), Modeling and Simulation, General Agricultural and Biological Sciences, Transcription Factors
Abstract: Background Experimentally validated data on gene regulation are hard to obtain. In particular, information about transcription factor binding sites in regulatory regions are scattered around in the literature. This impedes their systematic in-context analysis, e.g. the inference of their conservation in evolutionary history. Results We demonstrate the power of integrative bioinformatics by including curated transcription factor binding site information into the UCSC genome browser, using wiki and custom tracks, which enable easy publication of annotation data. Data integration allows to investigate the evolution of gene regulation of the pluripotency-associated genes Oct4, Sox2 and Nanog. For the first time, experimentally validated transcription factor binding sites in the regulatory regions of all three genes were assembled together based on manual curation of data from 39 publications. Using the UCSC genome browser, these data were then visualized in the context of multi-species conservation based on genomic alignment. We confirm previous hypotheses regarding the evolutionary age of specific regulatory patterns, establishing their "deep homology". We also confirm some other principles of Carroll's "Genetic theory of Morphological Evolution", such as "mosaic pleiotropy", exemplified by the dual role of Sox2 reflected in its regulatory region. Conclusions We were able to elucidate some aspects of the evolution of gene regulation for three genes associated with pluripotency. Based on the expected return on investment for the community, we encourage other scientists to contribute experimental data on gene regulation (original work as well as data collected for reviews) to the UCSC system, to enable studies of the evolution of gene regulation on a large scale, and to report their findings. Reviewers This article was reviewed by Dr. Gustavo Glusman and Dr. Juan Caballero, Institute for Systems Biology, Seattle, USA (nominated by Dr. Doron Lancet, Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel), Dr. Niels Grabe, TIGA Center (BIOQUANT) and Medical Systems Biology Group, Institute of Medical Biometry and Informatics, University Hospital Heidelberg, Germany (nominated by Dr. Mikhail Gelfand, Department of Bioinformatics, Institute of Information Transfer Problems, Russian Academy of Science, Moscow, Russian Federation) and Dr. Franz-Josef Müller, Center for Regenerative Medicine, The Scripps Research Institute, La Jolla, CA, USA and University Hospital for Psychiatry and Psychotherapy (part of ZIP gGmbH), University of Kiel, Germany (nominated by Dr. Trey Ideker, University of California, San Diego, La Jolla CA, United States).
Published: 2010

49. IMGD: an integrated platform supporting comparative genomics and phylogenetics of insect mitochondrial genomes

Author: Bongsoo Park, Seunghwan Lee, Jaeyoung Choi, Yong-Hwan Lee, Donghan Kim, Jae-Young Lee, Jongsun Park, Seogchan Kang, Kyohun Ahn, Wonho Song, Kyongyong Jung, and Wonhoon Lee
Subjects: Mitochondrial DNA, Insecta, dbSNP, lcsh:QH426-470, lcsh:Biotechnology, Genomics, Genome browser, Computational biology, Biology, Polymorphism, Single Nucleotide, Genome, Database, Phylogenetics, lcsh:TP248.13-248.65, Genetics, Animals, Phylogeny, Comparative genomics, Base Composition, Phylogenetic tree, Coleoptera, lcsh:Genetics, Genome, Mitochondrial, Databases, Nucleic Acid, Algorithms, Biotechnology
Abstract: Background Sequences and organization of the mitochondrial genome have been used as markers to investigate evolutionary history and relationships in many taxonomic groups. The rapidly increasing mitochondrial genome sequences from diverse insects provide ample opportunities to explore various global evolutionary questions in the superclass Hexapoda. To adequately support such questions, it is imperative to establish an informatics platform that facilitates the retrieval and utilization of available mitochondrial genome sequence data. Results The Insect Mitochondrial Genome Database (IMGD) is a new integrated platform that archives the mitochondrial genome sequences from 25,747 hexapod species, including 112 completely sequenced and 20 nearly completed genomes and 113,985 partially sequenced mitochondrial genomes. The Species-driven User Interface (SUI) of IMGD supports data retrieval and diverse analyses at multi-taxon levels. The Phyloviewer implemented in IMGD provides three methods for drawing phylogenetic trees and displays the resulting trees on the web. The SNP database incorporated to IMGD presents the distribution of SNPs and INDELs in the mitochondrial genomes of multiple isolates within eight species. A newly developed comparative SNU Genome Browser supports the graphical presentation and interactive interface for the identified SNPs/INDELs. Conclusion The IMGD provides a solid foundation for the comparative mitochondrial genomics and phylogenetics of insects. All data and functions described here are available at the web site http://www.imgd.org/.
Published: 2009

50. The clickable genome

Author: Alan Packer
Subjects: Genetics, Genome browser, Clickable, Computational biology, Biology, Molecular Biology, Genome, Genetics (clinical)
Published: 2007

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

58 results on '"Genome browser"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources