143 results on '"Brent, Michael R"'
Search Results
102. Constrained optimization for validation-guided conditional random field learning
- Author
-
Chen, Minmin, primary, Chen, Yixin, additional, Brent, Michael R., additional, and Tenney, Aaron E., additional
- Published
- 2009
- Full Text
- View/download PDF
103. Steady progress and recent breakthroughs in the accuracy of automated genome annotation
- Author
-
Brent, Michael R., primary
- Published
- 2008
- Full Text
- View/download PDF
104. Using N‐SCAN or TWINSCAN to Predict Gene Structures in Genomic DNA Sequences
- Author
-
van Baren, Marijke J., primary, Koebbe, Brian C., additional, and Brent, Michael R., additional
- Published
- 2007
- Full Text
- View/download PDF
105. How does eukaryotic gene prediction work?
- Author
-
Brent, Michael R, primary
- Published
- 2007
- Full Text
- View/download PDF
106. Matrix and Steiner-triple-system smart pooling assays for high-performance transcription regulatory network mapping
- Author
-
Vermeirssen, Vanessa, primary, Deplancke, Bart, additional, Barrasa, M Inmaculada, additional, Reece-Hoyes, John S, additional, Arda, H Efsun, additional, Grove, Christian A, additional, Martinez, Natalia J, additional, Sequerra, Reynaldo, additional, Doucette-Stamm, Lynn, additional, Brent, Michael R, additional, and Walhout, Albertha J M, additional
- Published
- 2007
- Full Text
- View/download PDF
107. Using ESTs to improve the accuracy of de novo gene prediction
- Author
-
Wei, Chaochun, primary and Brent, Michael R, additional
- Published
- 2006
- Full Text
- View/download PDF
108. Using Multiple Alignments to Improve Gene Prediction
- Author
-
Gross, Samuel S., primary and Brent, Michael R., additional
- Published
- 2006
- Full Text
- View/download PDF
109. Gene finding in the chicken genome
- Author
-
Eyras, Eduardo, primary, Reymond, Alexandre, additional, Castelo, Robert, additional, Bye, Jacqueline M, additional, Camara, Francisco, additional, Flicek, Paul, additional, Huckle, Elizabeth J, additional, Parra, Genis, additional, Shteynberg, David D, additional, Wyss, Carine, additional, Rogers, Jane, additional, Antonarakis, Stylianos E, additional, Birney, Ewan, additional, Guigo, Roderic, additional, and Brent, Michael R, additional
- Published
- 2005
- Full Text
- View/download PDF
110. Using TWINSCAN to Predict Gene Structures in Genomic DNA Sequences
- Author
-
Hu, Ping, primary and Brent, Michael R., additional
- Published
- 2003
- Full Text
- View/download PDF
111. Unsupervised learning of morphology using a novel directed search algorithm
- Author
-
Snover, Matthew G., primary, Jarosz, Gaja E., additional, and Brent, Michael R., additional
- Published
- 2002
- Full Text
- View/download PDF
112. Using Multiple Alignments to Improve Gene Prediction.
- Author
-
Miyano, Satoru, Mesirov, Jill, Kasif, Simon, Istrail, Sorin, Pevzner, Pavel, Waterman, Michael, Gross, Samuel S., and Brent, Michael R.
- Abstract
The multiple species de novo gene prediction problem can be stated as follows: given an alignment of genomic sequences from two or more organisms, predict the location and structure of all protein-coding genes in one or more of the sequences. Here, we present a new system, N-SCAN (a.k.a. TWINSCAN 3.0), for addressing this problem. N-SCAN has the ability to model dependencies between the aligned sequences, context-dependent substitution rates, and insertions and deletions in the sequences. An implementation of N-SCAN was created and used to generate predictions for the entire human genome. An analysis of the predictions reveals that N-SCAN's predictive accuracy in human exceeds that of all previously published whole-genome de novo gene predictors. In addition, predictions were generated for the genome of the fruit fly Drosophila melanogaster to demonstrate the applicability of N-SCAN to invertebrate gene prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
113. Chinese text segmentation with MBDP-1
- Author
-
Brent, Michael R., primary and Tao, Xiaopeng, additional
- Published
- 2001
- Full Text
- View/download PDF
114. A Bayesian model for morpheme and paradigm identification
- Author
-
Snover, Matthew G., primary and Brent, Michael R., additional
- Published
- 2001
- Full Text
- View/download PDF
115. Speech segmentation and word discovery: a computational perspective
- Author
-
Brent, Michael R., primary
- Published
- 1999
- Full Text
- View/download PDF
116. Syntactic categorization in early language acquisition: formalizing the role of distributional analysis
- Author
-
Cartwright, Timothy A., primary and Brent, Michael R., additional
- Published
- 1997
- Full Text
- View/download PDF
117. Automatic semantic classification of verbs from their syntactic contexts
- Author
-
Brent, Michael R., primary
- Published
- 1991
- Full Text
- View/download PDF
118. Automatic acquisition of subcategorization frames from untagged text
- Author
-
Brent, Michael R., primary
- Published
- 1991
- Full Text
- View/download PDF
119. Automatic acquisition of subcategorization frames from tagged text
- Author
-
Brent, Michael R., primary and Berwick, Robert C., additional
- Published
- 1991
- Full Text
- View/download PDF
120. Using ESTs to improve the accuracy of de novo gene prediction.
- Author
-
Chaochun Wei and Brent, Michael R
- Subjects
EXONS (Genetics) ,SPLIT genes ,GENES ,NEMATODES ,MOLECULAR genetics - Abstract
Background: ESTs are a tremendous resource for determining the exon-intron structures of genes, but even extensive EST sequencing tends to leave many exons and genes untouched. Gene prediction systems based exclusively on EST alignments miss these exons and genes, leading to poor sensitivity. De novo gene prediction systems, which ignore ESTs in favor of genomic sequence, can predict such "untouched" exons, but they are less accurate when predicting exons to which ESTs align. TWINSCAN is the most accurate de novo gene finder available for nematodes and NSCAN is the most accurate for mammals, as measured by exact CDS gene prediction and exact exon prediction. Results: TWINSCAṈEST is a new system that successfully combines EST alignments with TWINSCAN. On the whole C. elegans genome TWINSCAṈEST shows 14% improvement in sensitivity and 13% in specificity in predicting exact gene structures compared to TWINSCAN without EST alignments. Not only are the structures revealed by EST alignments predicted correctly, but these also constrain the predictions without alignments, improving their accuracy. For the human genome, we used the same approach with N-SCAN, creating N-SCAṈEST. On the whole genome, N-SCAṈEST produced a 6% improvement in sensitivity and 1% in specificity of exact gene structure predictions compared to N-SCAN. Conclusion: TWINSCAṈEST and N-SCAṈEST are more accurate than TWINSCAN and NSCAN, while retaining their ability to discover novel genes to which no ESTs align. Thus, we recommend using the EST versions of these programs to annotate any genome for which EST information is available. TWINSCAṈEST and N-SCAṈEST are part of the TWINSCAN open source software package http://genes.cse.wustl.edu/distribution/downloaḏTS.html. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
121. A simplified theory of tense representations and constraints on their composition
- Author
-
Brent, Michael R., primary
- Published
- 1990
- Full Text
- View/download PDF
122. Cryptococcus neoformansDual GDP-Mannose Transporters and Their Role in Biology and Virulence
- Author
-
Wang, Zhuo A., Griffith, Cara L., Skowyra, Michael L., Salinas, Nichole, Williams, Matthew, Maier, Ezekiel J., Gish, Stacey R., Liu, Hong, Brent, Michael R., and Doering, Tamara L.
- Abstract
ABSTRACTCryptococcus neoformansis an opportunistic yeast responsible for lethal meningoencephalitis in humans. This pathogen elaborates a polysaccharide capsule, which is its major virulence factor. Mannose constitutes over one-half of the capsule mass and is also extensively utilized in cell wall synthesis and in glycosylation of proteins and lipids. The activated mannose donor for most biosynthetic reactions, GDP-mannose, is made in the cytosol, although it is primarily consumed in secretory organelles. This compartmentalization necessitates specific transmembrane transporters to make the donor available for glycan synthesis. We previously identified two cryptococcal GDP-mannose transporters, Gmt1 and Gmt2. Biochemical studies of each protein expressed in Saccharomyces cerevisiaeshowed that both are functional, with similar kinetics and substrate specificities in vitro. We have now examined these proteins in vivoand demonstrate that cells lacking Gmt1 show significant phenotypic differences from those lacking Gmt2 in terms of growth, colony morphology, protein glycosylation, and capsule phenotypes. Some of these observations may be explained by differential expression of the two genes, but others suggest that the two proteins play overlapping but nonidentical roles in cryptococcal biology. Furthermore, gmt1 gmt2double mutant cells, which are unexpectedly viable, exhibit severe defects in capsule synthesis and protein glycosylation and are avirulent in mouse models of cryptococcosis.
- Published
- 2014
- Full Text
- View/download PDF
123. Leveraging a new data resource to define the response of Cryptococcus neoformans to environmental signals.
- Author
-
Kang, Yu Sung, Jung, Jeffery, Brown, Holly L, Mateusiak, Chase, Doering, Tamara L, and Brent, Michael R
- Subjects
- *
CYCLIC adenylic acid , *IN vitro studies , *RESEARCH funding , *FUNGI , *CULTURE media (Biology) , *GENE expression , *CELL culture , *RNA , *CRYPTOCOCCUS , *SEQUENCE analysis - Abstract
Cryptococcus neoformans is an opportunistic fungal pathogen with a polysaccharide capsule that becomes greatly enlarged in the mammalian host and during in vitro growth under host-like conditions. To understand how individual environmental signals affect capsule size and gene expression, we grew cells in all combinations of 5 signals implicated in capsule size and systematically measured cell and capsule sizes. We also sampled these cultures over time and performed RNA-seq in quadruplicate, yielding 881 RNA-seq samples. Analysis of the resulting data sets showed that capsule induction in tissue culture medium, typically used to represent host-like conditions, requires the presence of either CO2 or exogenous cyclic AMP. Surprisingly, adding either of these pushes overall gene expression in the opposite direction from tissue culture media alone, even though both are required for capsule development. Another unexpected finding was that rich medium blocks capsule growth completely. Statistical analysis further revealed many genes whose expression is associated with capsule thickness; deletion of one of these significantly reduced capsule size. Beyond illuminating capsule induction, our massive, uniformly collected data set will be a significant resource for the research community. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
124. Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE
- Author
-
modENCODE Consortium, Roy, Sushmita, Ernst, Jason, Kharchenko, Peter V., Kheradpour, Pouya, Negre, Nicolas, Eaton, Matthew L., Landolin, Jane M., Bristow, Christopher A., Ma, Lijia, Lin, Michael F., Washietl, Stefan, Arshinoff, Bradley I., Ay, Ferhat, Meyer, Patrick E., Robine, Nicolas, Washington, Nicole L., Stefano, Luisa Di, Berezikov, Eugene, Brown, Christopher D., Candeias, Rogerio, Carlson, Joseph W., Carr, Adrian, Jungreis, Irwin, Marbach, Daniel, Sealfon, Rachel, Tolstorukov, Michael Y., Will, Sebastian, Alekseyenko, Artyom A., Artieri, Carlo, Booth, Benjamin W., Brooks, Angela N., Dai, Qi, Davis, Carrie A., Duff, Michael O., Feng, Xin, Gorchakov, Andrey A., Gu, Tingting, Henikoff, Jorja G., Kapranov, Philipp, Li, Renhua, MacAlpine, Heather K., Malone, John, Minoda, Aki, Nordman, Jared, Okamura, Katsutomo, Perry, Marc, Powell, Sara K., Riddle, Nicole C., Sakai, Akiko, Samsonova, Anastasia, Sandler, Jeremy E., Schwartz, Yuri B., Sher, Noa, Spokony, Rebecca, Sturgill, David, Baren, Marijke van, Wan, Kenneth H., Yang, Li, Yu, Charles, Feingold, Elise, Good, Peter, Guyer, Mark, Lowdon, Rebecca, Ahmad, Kami, Andrews, Justen, Berger, Bonnie, Brenner, Steven E., Brent, Michael R., Cherbas, Lucy, Elgin, Sarah R., Gingeras, Thomas R., Grossman, Robert, Hoskins, Roger A., Kaufman, Thomas C., Kent, William, Kuroda, Mitzi I., Orr-Weaver, Terry, Perrimon, Norbert, Pirrotta, Vincenzo, Posakony, James W., Ren, Bing, Russell, Steven, Cherbas, Peter, Graveley, Brenton R., Lewis, Suzanna, Micklem, Gos, Oliver, Brian, Park, Peter J., Celniker, Susan E., Henikoff, Steven, Karpen, Gary H., Lai, Eric C., MacAlpine, David M., Stein, Lincoln D., White, Kevin P., and Kellis, Manolis
- Abstract
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage-and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation., Accepted Manuscript
- Published
- 2010
125. Predicting which genes will respond to transcription factor perturbations.
- Author
-
Yiming Kang, Jung, Wooseok J., and Brent, Michael R.
- Subjects
- *
TRANSCRIPTION factors , *MACHINE learning , *GENE regulatory networks , *LOCATION data , *GENES , *GENE expression - Abstract
The ability to predict which genes will respond to the perturbation of a transcription factor serves as a benchmark for our systems-level understanding of transcriptional regulatory networks. In previous work, machine learning models have been trained to predict static gene expression levels in a biological sample by using data from the same or similar samples, including data on their transcription factor binding locations, histone marks, or DNA sequence. We report on a different challenge--training machine learning models to predict which genes will respond to the perturbation of a transcription factor without using any data from the perturbed cells. We find that existing transcription factor location data (ChIP-seq) from human cells have very little detectable utility for predicting which genes will respond to perturbation of a transcription factor. Features of genes, including their preperturbation expression level and expression variation, are very useful for predicting responses to perturbation of any transcription factor. This shows that some genes are poised to respond to transcription factor perturbations and others are resistant, shedding light on why it has been so difficult to predict responses from binding locations. Certain histone marks, including H3K4me1 and H3K4me3, have some predictive power when located downstream of the transcription start site. However, the predictive power of histone marks is much less than that of gene expression level and expression variation. Sequence-based or epigenetic properties of genes strongly influence their tendency to respond to direct transcription factor perturbations, partially explaining the oft-noted difficulty of predicting responsiveness from transcription factor binding location data. These molecular features are largely reflected in and summarized by the gene's expression level and expression variation. Code is available at https://github.com/BrentLab/TFPertRespExplainer. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
126. Using TWINSCANto Predict Gene Structures in Genomic DNASequences
- Author
-
Hu, Ping and Brent, Michael R.
- Abstract
TWINSCAN is a gene‐prediction system that combines the methods of ab initio predictors like GENSCAN with information derived from genome comparison. This unit describes the use of TWINSCAN to identify gene structures in eukaryotic genomic sequences. Protocols for using TWINSCAN through its Web interface and from the command line in a Linux environment are provided. Detailed discussion about the appropriate parameter settings, input sequence processing, and choice of genome for comparison are included.
- Published
- 2003
- Full Text
- View/download PDF
127. Closing in on the C. elegans ORFeome by cloning TWINSCAN predictions.
- Author
-
Chaochun Wei, Lamesch, Philippe, Arumugam, Manimozhiyan, Rosenberg, Jennifer, Ping Hu, Vidal, Marc, and Brent, Michael R.
- Subjects
- *
CAENORHABDITIS elegans , *CAENORHABDITIS , *GENOMES , *GENOMICS , *GENETICS , *GENETIC engineering - Abstract
The genome of Caenorhabditids elegans was the first animal genome to be sequenced. Although considerable effort has been devoted to annotating it, the standard WormBase annotation contains thousands of predicted genes for which there is no cDNA or EST evidence. We hypothesized that a more complete experimental annotation could be obtained by creating a more accurate gene-prediction program and then amplifying and sequencing predicted genes. Our approach was to adapt the TWINSCAN gene prediction system to C. elegans and C. briggsae and to improve its splice site and intron-length models. The resulting system has 60% sensitivity and 58% specificity in exact prediction of open reading frames (ORFs), and hence, proteins—the best results we are aware of any multicellular organism. We then attempted to amplify, clone, and sequence 265 TWINSCAN-predicted ORFs that did not overlap WormBase gene annotations. The success rate was 55%, adding 146 genes that were completely absent from WormBase to the ORF clone collection (ORFeome). The same procedure had a 7% success rate on 90 Worm Base ‘predicted’ genes that do not overlap TWINSCAN predictions. These results indicate that the accuracy of WormBase could be significantly increased by replacing its partially curated predicted genes with TWINSCAN predictions. The technology described in this study will continue to drive the C. elegans ORFeome toward completion and contribute to the annotation of the three Caenorhabditis species currently being sequenced. The results also suggest that this technology can significantly improve our knowledge of the ‘parts list’ for even the best-studied model organisms. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
128. Identification of Rat Genes by TWINSCAN Gene Prediction, RT-PCR, and Direct Sequencing.
- Author
-
Jia Qian Wu, Shteynberg, DAvid, Arumugan, Manimozhiyan, Gibbs, Richard A., and Brent, Michael R.
- Subjects
- *
GENES , *RATS , *POLYMERASE chain reaction , *GENOMES , *INTRONS , *SPLIT genes - Abstract
The publication of a draft sequence of a third mammalian genome—that of the rat—suggests a need to rethink genome annotation. New mammalian sequences will not receive the kind of labor-intensive annotation efforts that are currently being devoted to human. In this paper, we demonstrate an alternative approach: reverse transcription-polymerase chain reaction (RT-PCR) and direct sequencing based on dual-genome de novo predictions from TWINSCAN. We tested 444 TWINSCAN-predicted rat genes that showed significant homology to known human genes implicated in disease but that were partially or completely missed by methods based on protein-to-genome mapping. Using primers in exons flanking a single predicted intron, we were able to verify the existence of 59% of these predicted genes. We then attempted to amplify the complete predicted open reading frames of 136 genes that were verified in the single-intron experiment. Spliced sequences were amplified in 46 cases (34%). We conclude that this procedure for elucidating gene structures with native cDNA sequences is cost-effective and will become even more so as it is further optimized. [ABSTRACT FROM AUTHOR]
- Published
- 2004
129. Epidemiology and genetic determination of measures of peripheral vascular health in the Long Life Family Study.
- Author
-
Fricke DR, Cvejkus RK, Barinas-Mitchell E, Feitosa MF, Murabito JM, Acharya S, Brent MR, Daw EW, Minster RL, Zmuda JM, and Kuipers AL
- Abstract
Peripheral artery disease (PAD) is a major contributor to morbidity in older adults. We aimed to determine genetic and non-genetic determinants of PAD and ankle-brachial index (ABI) in the Long Life Family Study (LLFS). 3006 individuals had ABI assessment, including 1090 probands (mean age 89), 1554 offspring (mean age 60) and 362 spousal controls (mean age 61). Outcomes include minimum of right and left ABIs and PAD (ABI <0.9). Stepwise regression determined independent significant non-genetic correlates of ABI and PAD. Genomewide association and linkage analyses were adjusted for age, sex, study center, significant principal components, and independent predictors. All analyses accounted for familial relatedness. Median ABI was 1.16 and 7.4% had PAD (18.2% probands, 1.0% offspring, 1.9% controls). Correlates of PAD and lower ABI included age, SBP, and creatinine (ABI only); BMI (ABI only), HDL (ABI only) and DBP (PAD only); and antihypertensive use, current smoking, female sex (ABI only), and high school noncompletion (ABI only). Genomewide linkage identified 1 region (15q12-q13) and association identified 3 single nucleotide polymorphisms (rs780213, rs12512857, rs79644420) of interest. In these families, PAD prevalence was low compared to other studies of older adults. We identified four genomic sites that may harbor variants associated with protection from PAD.
- Published
- 2025
- Full Text
- View/download PDF
130. Construction of Multi-Modal Transcriptome-Small Molecule Interaction Networks from High-Throughput Measurements to Study Human Complex Traits.
- Author
-
Akbary Moghaddam V, Acharya S, Schwaiger-Haber M, Liao S, Jung WJ, Thyagarajan B, Shriver LP, Daw EW, Saccone NL, An P, Brent MR, Patti GJ, and Province MA
- Abstract
Small molecules (SMs) are integral to biological processes, influencing metabolism, homeostasis, and regulatory networks. Despite their importance, a significant knowledge gap exists regarding their downstream effects on biological pathways and gene expression, largely due to differences in scale, variability, and noise between untargeted metabolomics and sequencing-based technologies. To address these challenges, we developed a multi-omics framework comprising a machine learning-based protocol for data processing, a semi-supervised network inference approach, and network-guided analysis of complex traits. The ML protocol harmonized metabolomic, lipidomic, and transcriptomic data through batch correction, principal component analysis, and regression-based adjustments, enabling unbiased and effective integration. Building on this, we proposed a semi-supervised method to construct transcriptome-SM interaction networks (TSI-Nets) by selectively integrating SM profiles into gene-level networks using a meta-analytic approach that accounts for scale differences and missing data across omics layers. Benchmarking against three conventional unsupervised methods demonstrated the superiority of our approach in generating diverse, biologically relevant, and robust networks. While single-omics analyses identified 18 significant genes and 3 significant SMs associated with insulin sensitivity (IS), network-guided analysis revealed novel connections between these markers. The top-ranked module highlighted a cross-talk between fiber-degrading gut microbiota and immune regulatory pathways, inferred by the interaction of the protective SM, N-acetylglycine (NAG), with immune genes ( FCER1A , HDC , MS4A2 , and CPA3 ), linked to improved IS and reduced obesity and inflammation. Together, this framework offers a robust and scalable solution for multi-modal network inference and analysis, advancing SM pathway discovery and their implications for human health. Leveraging data from a population of thousands of individuals with extended longevity, the inferred TSI-Nets demonstrate generalizability across diverse conditions and complex traits. These networks are publicly available as a resource for the research community., Competing Interests: Declaration of Competing interest G.J.P. is a scientific advisory board member for Cambridge Isotope Laboratories and has a collaborative research agreement with Agilent Technologies. G.J.P. is the Chief Scientific Officer of Panome Bio.
- Published
- 2025
- Full Text
- View/download PDF
131. Leveraging a new data resource to define the response of C. neoformans to environmental signals: How host-like signals drive gene expression and capsule expansion in Cryptococcus neoformans .
- Author
-
Kang YS, Jung J, Brown H, Mateusiak C, Doering TL, and Brent MR
- Abstract
Cryptococcus neoformans is an opportunistic fungal pathogen with a polysaccharide capsule that becomes greatly enlarged in the mammalian host and during in vitro growth under host-like conditions. To understand how individual environmental signals affect capsule size and gene expression, we grew cells in all combinations of five signals implicated in capsule size and systematically measured cell and capsule sizes. We also sampled these cultures over time and performed RNA-Seq in quadruplicate, yielding 881 RNA-Seq samples. Analysis of the resulting data sets showed that capsule induction in tissue culture medium, typically used to represent host-like conditions, requires the presence of either CO
2 or exogenous cyclic AMP (cAMP). Surprisingly, adding either of these pushes overall gene expression in the opposite direction from tissue culture media alone, even though both are required for capsule development. Another unexpected finding was that rich medium blocks capsule growth completely. Statistical analysis further revealed many genes whose expression is associated with capsule thickness; deletion of one of these significantly reduced capsule size. Beyond illuminating capsule induction, our massive, uniformly collected dataset will be a significant resource for the research community., Competing Interests: COMPETING INTERESTS The authors declare that they have no competing interests.- Published
- 2024
- Full Text
- View/download PDF
132. Multi-omics Integration Identifies Genes Influencing Traits Associated with Cardiovascular Risks: The Long Life Family Study.
- Author
-
Acharya S, Liao S, Jung WJ, Kang YS, Moghaddam VA, Feitosa M, Wojczynski M, Lin S, Anema JA, Schwander K, Connell JO, Province M, and Brent MR
- Abstract
The Long Life Family Study (LLFS) enrolled 4,953 participants in 539 pedigrees displaying exceptional longevity. To identify genetic mechanisms that affect cardiovascular risks in the LLFS population, we developed a multi-omics integration pipeline and applied it to 11 traits associated with cardiovascular risks. Using our pipeline, we aggregated gene-level statistics from rare-variant analysis, GWAS, and gene expression-trait association by Correlated Meta-Analysis (CMA). Across all traits, CMA identified 64 significant genes after Bonferroni correction (p ≤ 2.8×10
-7 ), 29 of which replicated in the Framingham Heart Study (FHS) cohort. Notably, 20 of the 29 replicated genes do not have a previously known trait-associated variant in the GWAS Catalog within 50 kb. Thirteen modules in Protein-Protein Interaction (PPI) networks are significantly enriched in genes with low meta-analysis p-values for at least one trait, three of which are replicated in the FHS cohort. The functional annotation of genes in these modules showed a significant over-representation of trait-related biological processes including sterol transport, protein-lipid complex remodeling, and immune response regulation. Among major findings, our results suggest a role of triglyceride-associated and mast-cell functional genes FCER1A , MS4A2 , GATA2 , HDC , and HRH4 in atherosclerosis risks. Our findings also suggest that lower expression of ATG2A , a gene we found to be associated with BMI, may be both a cause and consequence of obesity. Finally, our results suggest that ENPP3 may play an intermediary role in triglyceride-induced inflammation. Our pipeline is freely available and implemented in the Nextflow workflow language, making it easily runnable on any compute platform (https://nf-co.re/omicsgenetraitassociation)., Competing Interests: Declaration of interests: The authors declare no competing interests.- Published
- 2024
- Full Text
- View/download PDF
133. Discovery of genomic and transcriptomic pleiotropy between kidney function and soluble receptor for advanced glycation end-products using correlated meta-analyses: The Long Life Family Study (LLFS).
- Author
-
Feitosa MF, Lin SJ, Acharya S, Thyagarajan B, Wojczynski MK, Kuipers AL, Kulminski A, Christensen K, Zmuda JM, Brent MR, and Province MA
- Abstract
Patients with chronic kidney disease (CKD) have increased oxidative stress and chronic inflammation, which may escalate the production of advanced glycation end-products (AGE). High soluble receptor for AGE (sRAGE) and low estimated glomerular filtration rate (eGFR) levels are associated with CKD and aging. We evaluated whether eGFR calculated from creatinine and cystatin C share pleiotropic genetic factors with sRAGE. We employed whole-genome sequencing and correlated meta-analyses on combined genomewide association study (GWAS) p -values in 4,182 individuals (age range: 24-110) from the Long Life Family Study (LLFS). We also conducted transcriptome-wide association studies (TWAS) on whole blood in a subset of 1,209 individuals. We identified 59 pleiotropic GWAS loci ( p <5×10
-8 ) and 17 TWAS genes (Bonferroni- p <2.73×10-6 ) for eGFR traits and sRAGE. TWAS genes, LSP1 and MIR23AHG , were associated with eGFR and sRAGE located within GWAS loci, lncRNA- KCNQ1OT1 and CACNA1A/CCDC130 , respectively. GWAS variants were eQTLs in the kidney glomeruli and tubules, and GWAS genes predicted kidney carcinoma. TWAS genes harbored eQTLs in the kidney, predicted kidney carcinoma, and connected enhancer-promoter variants with kidney function-related phenotypes at p <5×10-8 . Additionally, higher allele frequencies of protective variants for eGFR traits were detected in LLFS than in ALFA-Europeans and TOPMed, suggesting better kidney function in healthy-aging LLFS than in general populations. Integrating genomic annotation and transcriptional gene activity revealed the enrichment of genetic elements in kidney function and kidney diseases. The identified pleiotropic loci and gene expressions for eGFR and sRAGE suggest their underlying shared genetic effects and highlight their roles in kidney- and aging-related signaling pathways.- Published
- 2023
- Full Text
- View/download PDF
134. Calling Cards: a customizable platform to longitudinally record protein-DNA interactions over time in cells and tissues.
- Author
-
Yen A, Mateusiak C, Sarafinovska S, Gachechiladze MA, Guo J, Chen X, Moudgil A, Cammack AJ, Hoisington-Lopez J, Crosby M, Brent MR, Mitra RD, and Dougherty JD
- Abstract
Calling Cards is a platform technology to record a cumulative history of transient protein-DNA interactions in the genome of genetically targeted cell types. The record of these interactions is recovered by next generation sequencing. Compared to other genomic assays, whose readout provides a snapshot at the time of harvest, Calling Cards enables correlation of historical molecular states to eventual outcomes or phenotypes. To achieve this, Calling Cards uses the piggyBac transposase to insert self-reporting transposon (SRT) "Calling Cards" into the genome, leaving permanent marks at interaction sites. Calling Cards can be deployed in a variety of in vitro and in vivo biological systems to study gene regulatory networks involved in development, aging, and disease. Out of the box, it assesses enhancer usage but can be adapted to profile specific transcription factor binding with custom transcription factor (TF)-piggyBac fusion proteins. The Calling Cards workflow has five main stages: delivery of Calling Card reagents, sample preparation, library preparation, sequencing, and data analysis. Here, we first present a comprehensive guide for experimental design, reagent selection, and optional customization of the platform to study additional TFs. Then, we provide an updated protocol for the five steps, using reagents that improve throughput and decrease costs, including an overview of a newly deployed computational pipeline. This protocol is designed for users with basic molecular biology experience to process samples into sequencing libraries in 1-2 days. Familiarity with bioinformatic analysis and command line tools is required to set up the pipeline in a high-performance computing environment and to conduct downstream analyses. Basic Protocol 1: Preparation and delivery of Calling Cards reagentsBasic Protocol 2: Sample preparationBasic Protocol 3: Sequencing library preparationBasic Protocol 4: Library pooling and sequencingBasic Protocol 5: Data analysis.
- Published
- 2023
- Full Text
- View/download PDF
135. Inferring TF activities and activity regulators from gene expression data with constraints from TF perturbation data.
- Author
-
Ma CZ and Brent MR
- Subjects
- DNA-Binding Proteins, Gene Expression, Gene Expression Profiling, Gene Expression Regulation, Saccharomyces cerevisiae genetics, Saccharomyces cerevisiae metabolism, Saccharomyces cerevisiae Proteins genetics, Saccharomyces cerevisiae Proteins metabolism, Transcription Factors genetics, Transcription Factors metabolism
- Abstract
Motivation: The activity of a transcription factor (TF) in a sample of cells is the extent to which it is exerting its regulatory potential. Many methods of inferring TF activity from gene expression data have been described, but due to the lack of appropriate large-scale datasets, systematic and objective validation has not been possible until now., Results: We systematically evaluate and optimize the approach to TF activity inference in which a gene expression matrix is factored into a condition-independent matrix of control strengths and a condition-dependent matrix of TF activity levels. We find that expression data in which the activities of individual TFs have been perturbed are both necessary and sufficient for obtaining good performance. To a considerable extent, control strengths inferred using expression data from one growth condition carry over to other conditions, so the control strength matrices derived here can be used by others. Finally, we apply these methods to gain insight into the upstream factors that regulate the activities of yeast TFs Gcr2, Gln3, Gcn4 and Msn2., Availability and Implementation: Evaluation code and data are available at https://doi.org/10.5281/zenodo.4050573., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2020. Published by Oxford University Press.)
- Published
- 2021
- Full Text
- View/download PDF
136. The transcriptional diversity of 25 Drosophila cell lines.
- Author
-
Cherbas L, Willingham A, Zhang D, Yang L, Zou Y, Eads BD, Carlson JW, Landolin JM, Kapranov P, Dumais J, Samsonova A, Choi JH, Roberts J, Davis CA, Tang H, van Baren MJ, Ghosh S, Dobin A, Bell K, Lin W, Langton L, Duff MO, Tenney AE, Zaleski C, Brent MR, Hoskins RA, Kaufman TC, Andrews J, Graveley BR, Perrimon N, Celniker SE, Gingeras TR, and Cherbas P
- Subjects
- Animals, Cell Line, Cluster Analysis, Exons, Female, Gene Expression Profiling, Male, Molecular Sequence Data, Signal Transduction genetics, Transcription Factors genetics, Drosophila melanogaster genetics, Genetic Variation, Transcription, Genetic
- Abstract
Drosophila melanogaster cell lines are important resources for cell biologists. Here, we catalog the expression of exons, genes, and unannotated transcriptional signals for 25 lines. Unannotated transcription is substantial (typically 19% of euchromatic signal). Conservatively, we identify 1405 novel transcribed regions; 684 of these appear to be new exons of neighboring, often distant, genes. Sixty-four percent of genes are expressed detectably in at least one line, but only 21% are detected in all lines. Each cell line expresses, on average, 5885 genes, including a common set of 3109. Expression levels vary over several orders of magnitude. Major signaling pathways are well represented: most differentiation pathways are "off" and survival/growth pathways "on." Roughly 50% of the genes expressed by each line are not part of the common set, and these show considerable individuality. Thirty-one percent are expressed at a higher level in at least one cell line than in any single developmental stage, suggesting that each line is enriched for genes characteristic of small sets of cells. Most remarkable is that imaginal disc-derived lines can generally be assigned, on the basis of expression, to small territories within developing discs. These mappings reveal unexpected stability of even fine-grained spatial determination. No two cell lines show identical transcription factor expression. We conclude that each line has retained features of an individual founder cell superimposed on a common "cell line" gene expression pattern.
- Published
- 2011
- Full Text
- View/download PDF
137. Pairagon+N-SCAN_EST: a model-based gene annotation pipeline.
- Author
-
Arumugam M, Wei C, Brown RH, and Brent MR
- Subjects
- Base Sequence, Computational Biology standards, DNA, Complementary analysis, Genes, Genome, Human, Genomics standards, Humans, Models, Statistical, Open Reading Frames, Phylogeny, RNA, Messenger analysis, Computational Biology methods, Expressed Sequence Tags, Genomics methods, Sequence Alignment, Software
- Abstract
Background: This paper describes Pairagon+N-SCAN_EST, a gene annotation pipeline that uses only native alignments. For each expressed sequence it chooses the best genomic alignment. Systems like ENSEMBL and ExoGean rely on trans alignments, in which expressed sequences are aligned to the genomic loci of putative homologs. Trans alignments contain a high proportion of mismatches, gaps, and/or apparently unspliceable introns, compared to alignments of cDNA sequences to their native loci. The Pairagon+N-SCAN_EST pipeline's first stage is Pairagon, a cDNA-to-genome alignment program based on a PairHMM probability model. This model relies on prior knowledge, such as the fact that introns must begin with GT, GC, or AT and end with AG or AC. It produces very precise alignments of high quality cDNA sequences. In the genomic regions between Pairagon's cDNA alignments, the pipeline combines EST alignments with de novo gene prediction by using N-SCAN_EST. N-SCAN_EST is based on a generalized HMM probability model augmented with a phylogenetic conservation model and EST alignments. It can predict complete transcripts by extending or merging EST alignments, but it can also predict genes in regions without EST alignments. Because they are based on probability models, both Pairagon and N-SCAN_EST can be trained automatically for new genomes and data sets., Results: On the ENCODE regions of the human genome, Pairagon+N-SCAN_EST was as accurate as any other system tested in the EGASP assessment, including ENSEMBL and ExoGean., Conclusion: With sufficient mRNA/EST evidence, genome annotation without trans alignments can compete successfully with systems like ENSEMBL and ExoGean, which use trans alignments.
- Published
- 2006
- Full Text
- View/download PDF
138. Using several pair-wise informant sequences for de novo prediction of alternatively spliced transcripts.
- Author
-
Flicek P and Brent MR
- Subjects
- Algorithms, Animals, Chickens genetics, Dogs, Genes, Genome, Human, Humans, Mice, Opossums genetics, Ranidae, Rats, Sequence Alignment, Software, Alternative Splicing, Computational Biology methods, Genomics methods, RNA, Messenger genetics
- Abstract
Background: As part of the ENCODE Genome Annotation Assessment Project (EGASP), we developed the MARS extension to the Twinscan algorithm. MARS is designed to find human alternatively spliced transcripts that are conserved in only one or a limited number of extant species. MARS is able to use an arbitrary number of informant sequences and predicts a number of alternative transcripts at each gene locus., Results: MARS uses the mouse, rat, dog, opossum, chicken, and frog genome sequences as pairwise informant sources for Twinscan and combines the resulting transcript predictions into genes based on coding (CDS) region overlap. Based on the EGASP assessment, MARS is one of the more accurate dual-genome prediction programs. Compared to the GENCODE annotation, we find that predictive sensitivity increases, while specificity decreases, as more informant species are used. MARS correctly predicts alternatively spliced transcripts for 11 of the 236 multi-exon GENCODE genes that are alternatively spliced in the coding region of their transcripts. For these genes a total of 24 correct transcripts are predicted., Conclusion: The MARS algorithm is able to predict alternatively spliced transcripts without the use of expressed sequence information, although the number of loci in which multiple predicted transcripts match multiple alternatively spliced transcripts in the GENCODE annotation is relatively small.
- Published
- 2006
- Full Text
- View/download PDF
139. Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.
- Author
-
Bajic VB, Brent MR, Brown RH, Frankish A, Harrow J, Ohler U, Solovyev VV, and Tan SL
- Subjects
- Computational Biology standards, Databases, Genetic, Genes, Genomics standards, Humans, RNA, Messenger analysis, Sequence Analysis, DNA, Sequence Analysis, RNA, Computational Biology methods, Genome, Human, Genomics methods, Promoter Regions, Genetic
- Abstract
Background: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends., Results: The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions., Conclusion: The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high-throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment.
- Published
- 2006
- Full Text
- View/download PDF
140. The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans.
- Author
-
Loftus BJ, Fung E, Roncaglia P, Rowley D, Amedeo P, Bruno D, Vamathevan J, Miranda M, Anderson IJ, Fraser JA, Allen JE, Bosdet IE, Brent MR, Chiu R, Doering TL, Donlin MJ, D'Souza CA, Fox DS, Grinberg V, Fu J, Fukushima M, Haas BJ, Huang JC, Janbon G, Jones SJ, Koo HL, Krzywinski MI, Kwon-Chung JK, Lengeler KB, Maiti R, Marra MA, Marra RE, Mathewson CA, Mitchell TG, Pertea M, Riggs FR, Salzberg SL, Schein JE, Shvartsbeyn A, Shin H, Shumway M, Specht CA, Suh BB, Tenney A, Utterback TR, Wickes BL, Wortman JR, Wye NH, Kronstad JW, Lodge JK, Heitman J, Davis RW, Fraser CM, and Hyman RW
- Subjects
- Alternative Splicing, Cell Wall metabolism, Chromosomes, Fungal genetics, Computational Biology, Cryptococcus neoformans pathogenicity, Cryptococcus neoformans physiology, DNA Transposable Elements, Fungal Proteins metabolism, Gene Library, Genes, Fungal, Humans, Introns, Molecular Sequence Data, Phenotype, Polymorphism, Genetic, Polymorphism, Single Nucleotide, Polysaccharides metabolism, RNA, Antisense, Sequence Analysis, DNA, Transcription, Genetic, Virulence, Virulence Factors metabolism, Cryptococcus neoformans genetics, Genome, Fungal
- Abstract
Cryptococcus neoformans is a basidiomycetous yeast ubiquitous in the environment, a model for fungal pathogenesis, and an opportunistic human pathogen of global importance. We have sequenced its approximately 20-megabase genome, which contains approximately 6500 intron-rich gene structures and encodes a transcriptome abundant in alternatively spliced and antisense messages. The genome is rich in transposons, many of which cluster at candidate centromeric regions. The presence of these transposons may drive karyotype instability and phenotypic variation. C. neoformans encodes unique genes that may contribute to its unusual virulence properties, and comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes.
- Published
- 2005
- Full Text
- View/download PDF
141. Reexamining the vocabulary spurt.
- Author
-
Ganger J and Brent MR
- Subjects
- Child, Child Language, Cognition, Female, Humans, Male, Time Factors, Twins psychology, Verbal Learning, Vocabulary
- Abstract
The authors asked whether there is evidence to support the existence of the vocabulary spurt, an increase in the rate of word learning that is thought to occur during the 2nd year of life. Using longitudinal data from 38 children, they modeled the rate of word learning with two functions, one with an inflection point (logistic), which would indicate a spurt, and one without an inflection point (quadratic). Comparing the fits of these two functions using likelihood ratios, they found that just 5 children had a better logistic fit, which indicated that these children underwent a spurt. The implications for theories of cognitive and language development are considered., (Copyright 2004 APA, all rights reserved)
- Published
- 2004
- Full Text
- View/download PDF
142. The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics.
- Author
-
Stein LD, Bao Z, Blasiar D, Blumenthal T, Brent MR, Chen N, Chinwalla A, Clarke L, Clee C, Coghlan A, Coulson A, D'Eustachio P, Fitch DH, Fulton LA, Fulton RE, Griffiths-Jones S, Harris TW, Hillier LW, Kamath R, Kuwabara PE, Mardis ER, Marra MA, Miner TL, Minx P, Mullikin JC, Plumb RW, Rogers J, Schein JE, Sohrmann M, Spieth J, Stajich JE, Wei C, Willey D, Wilson RK, Durbin R, and Waterston RH
- Subjects
- Animals, Biological Evolution, Chromosome Mapping, Chromosomes, Artificial, Bacterial, Cluster Analysis, Codon, Conserved Sequence, Evolution, Molecular, Exons, Gene Library, Interspersed Repetitive Sequences, Introns, MicroRNAs genetics, Models, Genetic, Models, Statistical, Molecular Sequence Data, Multigene Family, Open Reading Frames, Physical Chromosome Mapping, Plasmids metabolism, Protein Structure, Tertiary, Proteins chemistry, RNA chemistry, RNA, Ribosomal genetics, RNA, Spliced Leader, RNA, Transfer genetics, Sequence Analysis, DNA, Species Specificity, Caenorhabditis genetics, Caenorhabditis elegans genetics, Genome, Genomics methods
- Abstract
The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found strong evidence for 1,300 new C. elegans genes. In addition, comparisons of the two genomes will help to understand the evolutionary forces that mold nematode genomes., Competing Interests: The authors have declared that no conflicts of interest exist.
- Published
- 2003
- Full Text
- View/download PDF
143. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes.
- Author
-
Guigo R, Dermitzakis ET, Agarwal P, Ponting CP, Parra G, Reymond A, Abril JF, Keibler E, Lyle R, Ucla C, Antonarakis SE, and Brent MR
- Subjects
- Amino Acid Sequence, Animals, Exons, Genetic Techniques, Humans, Introns, Mice, Molecular Sequence Data, Reverse Transcriptase Polymerase Chain Reaction, Sequence Analysis, DNA, Sequence Homology, Amino Acid, Tissue Distribution, Genome, Genome, Human
- Abstract
A primary motivation for sequencing the mouse genome was to accelerate the discovery of mammalian genes by using sequence conservation between mouse and human to identify coding exons. Achieving this goal proved challenging because of the large proportion of the mouse and human genomes that is apparently conserved but apparently does not code for protein. We developed a two-stage procedure that exploits the mouse and human genome sequences to produce a set of genes with a much higher rate of experimental verification than previously reported prediction methods. RT-PCR amplification and direct sequencing applied to an initial sample of mouse predictions that do not overlap previously known genes verified the regions flanking one intron in 139 predictions, with verification rates reaching 76%. On average, the confirmed predictions show more restricted expression patterns than the mouse orthologs of known human genes, and two-thirds lack homologs in fish genomes, demonstrating the sensitivity of this dual-genome approach to hard-to-find genes. We verified 112 previously unknown homologs of known proteins, including two homeobox proteins relevant to developmental biology, an aquaporin, and a homolog of dystrophin. We estimate that transcription and splicing can be verified for >1,000 gene predictions identified by this method that do not overlap known genes. This is likely to constitute a significant fraction of the previously unknown, multiexon mammalian genes.
- Published
- 2003
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.