14 results on '"Segun Jung"'
Search Results
2. Genetic deletion of Sphk2 confers protection against Pseudomonas aeruginosa mediated differential expression of genes related to virulent infection and inflammation in mouse lung
- Author
-
David L. Ebenezer, Panfeng Fu, Yashaswin Krishnan, Mark Maienschein-Cline, Hong Hu, Segun Jung, Ravi Madduri, Zarema Arbieva, Anantha Harijith, and Viswanathan Natarajan
- Subjects
Pseudomonas aeruginosa ,Pneumonia ,Sphingosine kinase 2 ,Sphingolipids ,Genomics, bacterial resistance ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Pseudomonas aeruginosa (PA) is an opportunistic Gram-negative bacterium that causes serious life threatening and nosocomial infections including pneumonia. PA has the ability to alter host genome to facilitate its invasion, thus increasing the virulence of the organism. Sphingosine-1- phosphate (S1P), a bioactive lipid, is known to play a key role in facilitating infection. Sphingosine kinases (SPHK) 1&2 phosphorylate sphingosine to generate S1P in mammalian cells. We reported earlier that Sphk2 −/− mice offered significant protection against lung inflammation, compared to wild type (WT) animals. Therefore, we profiled the differential expression of genes between the protected group of Sphk2 −/− and the wild type controls to better understand the underlying protective mechanisms related to the Sphk2 deletion in lung inflammatory injury. Whole transcriptome shotgun sequencing (RNA-Seq) was performed on mouse lung tissue using NextSeq 500 sequencing system. Results Two-way analysis of variance (ANOVA) analysis was performed and differentially expressed genes following PA infection were identified using whole transcriptome of Sphk2 −/− mice and their WT counterparts. Pathway (PW) enrichment analyses of the RNA seq data identified several signaling pathways that are likely to play a crucial role in pneumonia caused by PA such as those involved in: 1. Immune response to PA infection and NF-κB signal transduction; 2. PKC signal transduction; 3. Impact on epigenetic regulation; 4. Epithelial sodium channel pathway; 5. Mucin expression; and 6. Bacterial infection related pathways. Our genomic data suggests a potential role for SPHK2 in PA-induced pneumonia through elevated expression of inflammatory genes in lung tissue. Further, validation by RT-PCR on 10 differentially expressed genes showed 100% concordance in terms of vectoral changes as well as significant fold change. Conclusion Using Sphk2 −/− mice and differential gene expression analysis, we have shown here that S1P/SPHK2 signaling could play a key role in promoting PA pneumonia. The identified genes promote inflammation and suppress others that naturally inhibit inflammation and host defense. Thus, targeting SPHK2/S1P signaling in PA-induced lung inflammation could serve as a potential therapy to combat PA-induced pneumonia.
- Published
- 2019
- Full Text
- View/download PDF
3. Identification of Genetic and Epigenetic Variants Associated with Breast Cancer Prognosis by Integrative Bioinformatics Analysis
- Author
-
Arunima Shilpi, Yingtao Bi, Segun Jung, Samir K. Patra, and Ramana V. Davuluri
- Subjects
Neoplasms. Tumors. Oncology. Including cancer and carcinogens ,RC254-282 - Abstract
Introduction Breast cancer being a multifaceted disease constitutes a wide spectrum of histological and molecular variability in tumors. However, the task for the identification of these variances is complicated by the interplay between inherited genetic and epigenetic aberrations. Therefore, this study provides an extrapolate outlook to the sinister partnership between DNA methylation and single-nucleotide polymorphisms (SNPs) in relevance to the identification of prognostic markers in breast cancer. The effect of these SNPs on methylation is defined as methylation quantitative trait loci (meQTL). Materialsand Methods We developed a novel method to identify prognostic gene signatures for breast cancer by integrating genomic and epigenomic data. This is based on the hypothesis that multiple sources of evidence pointing to the same gene or pathway are likely to lead to reduced false positives. We also apply random resampling to reduce overfitting noise by dividing samples into training and testing data sets. Specifically, the common samples between Illumina 450 DNA methylation, Affymetrix SNP array, and clinical data sets obtained from the Cancer Genome Atlas (TCGA) for breast invasive carcinoma (BRCA) were randomly divided into training and test models. An intensive statistical analysis based on log-rank test and Cox proportional hazard model has established a significant association between differential methylation and the stratification of breast cancer patients into high- and low-risk groups, respectively. Results The comprehensive assessment based on the conjoint effect of CpG–SNP pair has guided in delaminating the breast cancer patients into the high- and low-risk groups. In particular, the most significant association was found with respect to cg05370838–rs2230576, cg00956490–rs940453, and cg11340537–rs2640785 CpG–SNP pairs. These CpG–SNP pairs were strongly associated with differential expression of ADAM8 , CREB5 , and EXPH5 genes, respectively. Besides, the exclusive effect of SNPs such as rs10101376, rs140679, and rs1538146 also hold significant prognostic determinant. Conclusions Thus, the analysis based on DNA methylation and SNPs have resulted in the identification of novel susceptible loci that hold prognostic relevance in breast cancer.
- Published
- 2017
- Full Text
- View/download PDF
4. Predicting helical topologies in RNA junctions as tree graphs.
- Author
-
Christian Laing, Segun Jung, Namhee Kim, Shereef Elmetwaly, Mai Zahran, and Tamar Schlick
- Subjects
Medicine ,Science - Abstract
RNA molecules are important cellular components involved in many fundamental biological processes. Understanding the mechanisms behind their functions requires knowledge of their tertiary structures. Though computational RNA folding approaches exist, they often require manual manipulation and expert intuition; predicting global long-range tertiary contacts remains challenging. Here we develop a computational approach and associated program module (RNAJAG) to predict helical arrangements/topologies in RNA junctions. Our method has two components: junction topology prediction and graph modeling. First, junction topologies are determined by a data mining approach from a given secondary structure of the target RNAs; second, the predicted topology is used to construct a tree graph consistent with geometric preferences analyzed from solved RNAs. The predicted graphs, which model the helical arrangements of RNA junctions for a large set of 200 junctions using a cross validation procedure, yield fairly good representations compared to the helical configurations in native RNAs, and can be further used to develop all-atom models as we show for two examples. Because junctions are among the most complex structural elements in RNA, this work advances folding structure prediction methods of large RNAs. The RNAJAG module is available to academic users upon request.
- Published
- 2013
- Full Text
- View/download PDF
5. A novel MERTK mutation causing retinitis pigmentosa
- Author
-
Segun Jung, Kaanan P. Shah, Michael A. Grassi, Ravi Madduri, Hasenin Al-khersan, and Alex Rodriguez
- Subjects
Male ,0301 basic medicine ,Proband ,DNA Mutational Analysis ,Nonsense mutation ,Biology ,Retina ,Article ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,0302 clinical medicine ,Locus heterogeneity ,medicine ,Humans ,Exome ,Exome sequencing ,Genetic testing ,Genetics ,c-Mer Tyrosine Kinase ,medicine.diagnostic_test ,Genetic heterogeneity ,DNA ,MERTK ,medicine.disease ,Sensory Systems ,Pedigree ,Ophthalmoscopy ,Ophthalmology ,030104 developmental biology ,Mutation ,030221 ophthalmology & optometry ,Female ,Allelic heterogeneity ,Retinitis Pigmentosa - Abstract
Retinitis pigmentosa (RP) is a genetically heterogeneous inherited retinal dystrophy. To date, over 80 genes have been implicated in RP. However, the disease demonstrates significant locus and allelic heterogeneity not entirely captured by current testing platforms. The purpose of the present study was to characterize the underlying mutation in a patient with RP without a molecular diagnosis after initial genetic testing. Whole-exome sequencing of the affected proband was performed. Candidate gene mutations were selected based on adherence to expected genetic inheritance pattern and predicted pathogenicity. Sanger sequencing of MERTK was completed on the patient’s unaffected mother, affected brother, and unaffected sister to determine genetic phase. Eight sequence variants were identified in the proband in known RP-associated genes. Sequence analysis revealed that the proband was a compound heterozygote with two independent mutations in MERTK, a novel nonsense mutation (c.2179C > T) and a previously reported missense variant (c.2530C > T). The proband’s affected brother also had both mutations. Predicted phase was confirmed in unaffected family members. Our study identifies a novel nonsense mutation in MERTK in a family with RP and no prior molecular diagnosis. The present study also demonstrates the clinical value of exome sequencing in determining the genetic basis of Mendelian diseases when standard genetic testing is unsuccessful.
- Published
- 2017
- Full Text
- View/download PDF
6. O3‐03‐01: MECHANISTIC AND DIRECTIONAL TRANSCRIPTIONAL REGULATORY NETWORKS IN ALZHEIMER'S DISEASE
- Author
-
Matthew A. Richards, Karen N. McFarland, Segun Jung, Nathan D. Price, Alex Rodriguez, Todd E. Golde, Paul Shannon, Paramita Chakrabarty, Mariet Allen, Ravi Madduri, Minerva M. Carrasquillo, Ian Foster, Nilufer Ertekin-Taner, Cory C. Funk, Leroy Hood, Rory Donovan-Maiye, Seth A. Ament, Max Robinson, and Noa Rappaport
- Subjects
Psychiatry and Mental health ,Cellular and Molecular Neuroscience ,Developmental Neuroscience ,Epidemiology ,Health Policy ,Neurology (clinical) ,Computational biology ,Disease ,Geriatrics and Gerontology ,Biology - Published
- 2018
- Full Text
- View/download PDF
7. Tertiary Motifs Revealed in Analyses of Higher-Order RNA Junctions
- Author
-
Tamar Schlick, Abdul Iqbal, Segun Jung, and Christian Laing
- Subjects
Models, Molecular ,Base pair ,Stacking ,Protein Data Bank (RCSB PDB) ,RNA ,Biology ,Article ,Crystallography ,chemistry.chemical_compound ,Models, Chemical ,chemistry ,Structural Biology ,Chemical physics ,Nucleic Acid Conformation ,Nucleic acid structure ,Coaxial ,Base Pairing ,Molecular Biology ,Cytosine - Abstract
RNA junctions are secondary structure elements formed when three or more helices come together. They are present in diverse RNA molecules with various fundamental functions in the cell. To better understand the intricate architecture of three-dimensional RNAs, we analyze currently solved 3D RNA junctions in terms of basepair interactions and three-dimensional configurations. First, we study basepair interaction diagrams for solved RNA junctions with five to ten helices and discuss common features. Second, we compare these higher-order junctions to those containing three or four helices and identify global motif patterns such as coaxial-stacking and parallel and perpendicular helical configurations. These analyses show that higher order junctions organize their helical components in parallel and helical configurations similar to lower order junctions. Their sub-junctions also resemble local helical configurations found in three and four-way junctions, and are stabilized by similar long-range interaction preferences such as A-minor interactions. Furthermore, loop regions within junctions are high in adenine but low in cytosine. And, in agreement with previous studies, we suggest that coaxial stacking between helices likely forms when the common single stranded loop is small in size; however, other factors such as stacking interactions involving non-canonical basepairs and proteins can greatly determine or disrupt coaxial stacking. Finally, we introduce the ribo-base interactions: when combined with the along-groove packing motif, these ribo-base interactions form novel motifs involved in perpendicular helix-helix interactions. Overall, these analyses suggest recurrent tertiary motifs that stabilize junction architecture, pack helices, and help form helical configurations that occur as sub-elements of larger junction networks. The frequent occurrence of similar helical motifs suggest Nature’s finite and perhaps limited repertoire of RNA helical conformation preferences. More generally, studies of RNA junctions and tertiary building blocks can ultimately help in the difficult task of RNA 3D structure prediction.
- Published
- 2009
- Full Text
- View/download PDF
8. Evaluation of data discretization methods to derive platform independent isoform expression signatures for multi-class tumor subtyping
- Author
-
Segun Jung, Yingtao Bi, and Ramana V. Davuluri
- Subjects
Discretization ,exon-array ,Feature selection ,Biology ,Machine Learning ,Multiclass classification ,03 medical and health sciences ,0302 clinical medicine ,RNA Isoforms ,Genetics ,Cluster Analysis ,Humans ,030304 developmental biology ,data discretization ,0303 health sciences ,business.industry ,Gene Expression Profiling ,Research ,Computational Biology ,platform transition ,Pattern recognition ,multi-class classification ,Class (biology) ,Expression (mathematics) ,Random forest ,Statistical classification ,Identification (information) ,ComputingMethodologies_PATTERNRECOGNITION ,030220 oncology & carcinogenesis ,Artificial intelligence ,RNA-seq ,Glioblastoma ,business ,Algorithms ,Biotechnology - Abstract
Background Many supervised learning algorithms have been applied in deriving gene signatures for patient stratification from gene expression data. However, transferring the multi-gene signatures from one analytical platform to another without loss of classification accuracy is a major challenge. Here, we compared three unsupervised data discretization methods--Equal-width binning, Equal-frequency binning, and k-means clustering--in accurately classifying the four known subtypes of glioblastoma multiforme (GBM) when the classification algorithms were trained on the isoform-level gene expression profiles from exon-array platform and tested on the corresponding profiles from RNA-seq data. Results We applied an integrated machine learning framework that involves three sequential steps; feature selection, data discretization, and classification. For models trained and tested on exon-array data, the addition of data discretization step led to robust and accurate predictive models with fewer number of variables in the final models. For models trained on exon-array data and tested on RNA-seq data, the addition of data discretization step dramatically improved the classification accuracies with Equal-frequency binning showing the highest improvement with more than 90% accuracies for all the models with features chosen by Random Forest based feature selection. Overall, SVM classifier coupled with Equal-frequency binning achieved the best accuracy (> 95%). Without data discretization, however, only 73.6% accuracy was achieved at most. Conclusions The classification algorithms, trained and tested on data from the same platform, yielded similar accuracies in predicting the four GBM subgroups. However, when dealing with cross-platform data, from exon-array to RNA-seq, the classifiers yielded stable models with highest classification accuracies on data transformed by Equal frequency binning. The approach presented here is generally applicable to other cancer types for classification and identification of molecular subgroups by integrating data across different gene expression platforms.
- Published
- 2015
- Full Text
- View/download PDF
9. Development of Bioinformatics Infrastructure for Genomics Research
- Author
-
Nicola J. Mulder, Ezekiel Adebiyi, Marion Adebiyi, Seun Adeyemi, Azza Ahmed, Rehab Ahmed, Bola Akanle, Mohamed Alibi, Don L. Armstrong, Shaun Aron, Efejiro Ashano, Shakuntala Baichoo, Alia Benkahla, David K. Brown, Emile R. Chimusa, Faisal M. Fadlelmola, Dare Falola, Segun Fatumo, Kais Ghedira, Amel Ghouila, Scott Hazelhurst, Itunuoluwa Isewon, Segun Jung, Samar Kamal Kassim, Jonathan K. Kayondo, Mamana Mbiyavanga, Ayton Meintjes, Somia Mohammed, Abayomi Mosaku, Ahmed Moussa, Mustafa Muhammd, Zahra Mungloo-Dilmohamud, Oyekanmi Nashiru, Trust Odia, Adaobi Okafor, Olaleye Oladipo, Victor Osamor, Jellili Oyelade, Khalid Sadki, Samson Pandam Salifu, Jumoke Soyemi, Sumir Panji, Fouzia Radouani, Oussama Souiai, Özlem Tastan Bishop, The HABioNet Consortium, as Members of the HAfrica Consortium, University of Cape Town, Department of Computer and Information Sciences, Covenant University, Covenant University Bioinformatics Research (CUBRe), University of Khartoum, Laboratoire de Bioinformatique, biomathématiques, biostatistiques (BIMS) (LR11IPT09), Institut Pasteur de Tunis, Réseau International des Instituts Pasteur (RIIP)-Réseau International des Instituts Pasteur (RIIP)-Université de Tunis El Manar (UTM), University of Illinois at Urbana-Champaign [Urbana], University of Illinois System, University of the Witwatersrand [Johannesburg] (WITS), Federal Ministry of Science and Technology [Abuja] (FMST), University of Mauritius, Rhodes University, Grahamstown, Institute of Infectious Diseases and Molecular Medicine (IDM), Future University of Sudan, Laboratoire de Transmission, Contrôle et Immunobiologie des Infections - Laboratory of Transmission, Control and Immunobiology of Infection (LR11IPT02), Réseau International des Instituts Pasteur (RIIP)-Réseau International des Instituts Pasteur (RIIP), Computation Institute [Chicago], University of Chicago, Université Ain Shams, Uganda Virus Research Institute (UVRI), Laboratoire des Technologies de l'Information et de la Communication [Tanger] (Labtic), Ecole Nationale des Sciences Appliquées [Tanger] (ENSAT), Landmark University [Omu-Aran], Université Mohammed V, Kwame Nkrumah University of Science and Technology [GHANA] (KNUST), École polytechnique fédérale d'Ilaro, Institut Pasteur du Maroc, Réseau International des Instituts Pasteur (RIIP), and H3ABioNet is supported by the National Institutes of Health Common Fund (grant number U41HG006941)
- Subjects
0301 basic medicine ,MESH: Genomics/methods ,Epidemiology ,Computer science ,[SDV]Life Sciences [q-bio] ,media_common.quotation_subject ,Genomics ,MESH: Africa ,Bioinformatics ,Data type ,03 medical and health sciences ,0302 clinical medicine ,Excellence ,Controlled vocabulary ,media_common ,MESH: Computational Biology/trends ,Community and Home Care ,Spatial data infrastructure ,MESH: Humans ,Data collection ,MESH: Biomedical Research/methods ,Data science ,Metadata ,030104 developmental biology ,Workflow ,Cardiology and Cardiovascular Medicine ,030217 neurology & neurosurgery - Abstract
Background: Although pockets of bioinformatics excellence have developed in Africa, generally, large-scale genomic data analysis has been limited by the availability of expertise and infrastructure. H3ABioNet, a pan-African bioinformatics network, was established to build capacity specifically to enable H3Africa (Human Heredity and Health in Africa) researchers to analyze their data in Africa. Since the inception of the H3Africa initiative, H3ABioNet’s role has evolved in response to changing needs from the consortium and the African bioinformatics community.Objectives: H3ABioNet set out to develop core bioinformatics infrastructure and capacity for genomics research in various aspects of data collection, transfer, storage, and analysis.Methods and Results: Various resources have been developed to address genomic data management and analysis needs of H3Africa researchers and other scientific communities on the continent. NetMap was developed and used to build an accurate picture of network performance within Africa and between Africa and the rest of the world, and Globus Online has been rolled out to facilitate data transfer. A participant recruitment database was developed to monitor participant enrollment, and data is being harmonized through the use of ontologies and controlled vocabularies. The standardized metadata will be integrated to provide a search facility for H3Africa data and biospecimens. Because H3Africa projects are generating large-scale genomic data, facilities for analysis and interpretation are critical. H3ABioNet is implementing several data analysis platforms that provide a large range of bioinformatics tools or workflows, such as Galaxy, the Job Management System, and eBiokits. A set of reproducible, portable, and cloud-scalable pipelines to support the multiple H3Africa data types are also being developed and dockerized to enable execution on multiple computing infrastructures. In addition, new tools have been developed for analysis of the uniquely divergent African data and for downstream interpretation of prioritized variants. To provide support for these and other bioinformatics queries, an online bioinformatics helpdesk backed by broad consortium expertise has been established. Further support is provided by means of various modes of bioinformatics training.Conclusions: For the past 4 years, the development of infrastructure support and human capacity through H3ABioNet, have significantly contributed to the establishment of African scientific networks, data analysis facilities, and training programs. Here, we describe the infrastructure and how it has affected genomics and bioinformatics research in Africa.HighlightsH3ABioNet is building capacity to enable analysis of genomic data in Africa.Infrastructure has been built for clinical and genomic data storage, management, and analysis.New algorithms and pipelines for African genomic data analysis have been developed.Data are being harmonized using ontologies to enable easy search and retrieval.Genomics training is implemented using various online and face-to-face approaches.
- Published
- 2017
- Full Text
- View/download PDF
10. Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data
- Author
-
Michael K. Showe, Louise C. Showe, Malik Yousef, and Segun Jung
- Subjects
Male ,Clustering high-dimensional data ,Computer science ,Gene Expression ,Feature selection ,lcsh:Computer applications to medicine. Medical informatics ,computer.software_genre ,Biochemistry ,Gene interaction ,Structural Biology ,Databases, Genetic ,Feature (machine learning) ,Humans ,Cluster analysis ,lcsh:QH301-705.5 ,Molecular Biology ,Regulation of gene expression ,business.industry ,Gene Expression Profiling ,Applied Mathematics ,Prostatic Neoplasms ,Pattern recognition ,Linear discriminant analysis ,Computer Science Applications ,Gene Expression Regulation, Neoplastic ,Support vector machine ,Gene expression profiling ,Statistical classification ,lcsh:Biology (General) ,Head and Neck Neoplasms ,Multigene Family ,lcsh:R858-859.7 ,Data mining ,Artificial intelligence ,business ,computer ,Research Article - Abstract
Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE) rather than recursive feature elimination (RFE). We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs), a supervised machine learning classification method, to identify and score (rank) those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE) is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA) with recursive feature elimination (SVM-RFE and PDA-RFE) are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together provide greater insight into the structure of the microarray data. Clustering genes for classification appears to result in some concomitant clustering of samples into subgroups. Our present implementation of SVM-RCE groups genes using the correlation metric. The success of the SVM-RCE method in classification suggests that gene interaction networks or other biologically relevant metrics that group genes based on functional parameters might also be useful.
- Published
- 2007
- Full Text
- View/download PDF
11. Learning from positive examples when the negative class is undetermined- microRNA gene identification
- Author
-
Malik Yousef, Segun Jung, Michael K. Showe, and Louise C. Showe
- Subjects
lcsh:QH426-470 ,business.industry ,Computer science ,Applied Mathematics ,Research ,MicroRNA Gene ,External validation ,Machine learning ,computer.software_genre ,Matthews correlation coefficient ,Support vector machine ,Naive Bayes classifier ,lcsh:Genetics ,lcsh:Biology (General) ,Computational Theory and Mathematics ,Structural Biology ,Artificial intelligence ,Data mining ,business ,computer ,Classifier (UML) ,lcsh:QH301-705.5 ,Molecular Biology - Abstract
Background The application of machine learning to classification problems that depend only on positive examples is gaining attention in the computational biology community. We and others have described the use of two-class machine learning to identify novel miRNAs. These methods require the generation of an artificial negative class. However, designation of the negative class can be problematic and if it is not properly done can affect the performance of the classifier dramatically and/or yield a biased estimate of performance. We present a study using one-class machine learning for microRNA (miRNA) discovery and compare one-class to two-class approaches using naïve Bayes and Support Vector Machines. These results are compared to published two-class miRNA prediction approaches. We also examine the ability of the one-class and two-class techniques to identify miRNAs in newly sequenced species. Results Of all methods tested, we found that 2-class naive Bayes and Support Vector Machines gave the best accuracy using our selected features and optimally chosen negative examples. One class methods showed average accuracies of 70–80% versus 90% for the two 2-class methods on the same feature sets. However, some one-class methods outperform some recently published two-class approaches with different selected features. Using the EBV genome as and external validation of the method we found one-class machine learning to work as well as or better than a two-class approach in identifying true miRNAs as well as predicting new miRNAs. Conclusion One and two class methods can both give useful classification accuracies when the negative class is well characterized. The advantage of one class methods is that it eliminates guessing at the optimal features for the negative class when they are not well defined. In these cases one-class methods can be superior to two-class methods when the features which are chosen as representative of that positive class are well defined. Availability The OneClassmiRNA program is available at: [1]
- Published
- 2008
12. Graph-based sampling for approximating global helical topologies of RNA.
- Author
-
Namhee Kim, Laing, Christian, Elmetwaly, Shereef, Segun Jung, Curuksu, Jeremy, and Schlick, Tamar
- Subjects
MOLECULAR structure of RNA ,TOPOLOGY ,NUCLEOTIDES ,DATA mining ,STATISTICAL sampling - Abstract
A current challenge in RNA structure prediction is the description of global helical arrangements compatible with a given secondary structure. Here we address this problem by developing a hierarchical graph sampling/data mining approach to reduce conformational space and accelerate global sampling of candidate topologies. Starting from a 2D structure, we construct an initial graph from size measures deduced from solved RNAs and junction topologies predicted by our data-mining algorithm RNAJAG trained on known RNAs. We sample these graphs in 3D space guided by knowledge-based statistical potentials derived from bending and torsion measures of internal loops as well as radii of gyration for known RNAs. Graph sampling results for 30 representative RNAs are analyzed and compared with reference graphs from both solved structures and predicted structures by available programs. This comparison indicates promise for our graph-based sampling approach for characterizing global helical arrangements in large RNAs: graph rmsds range from 2.52 to 28.24 Å for RNAs of size 25-158 nucleotides, and more than half of our graph predictions improve upon other programs. The efficiency in graph sampling, however, implies an additional step of translating candidate graphs into atomic models. Such models can be built with the same idea of graph partitioning and build-up procedures we used for RNA design. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
13. Learning from positive examples when the negative class is undetermined- microRNA gene identification.
- Author
-
Yousef, Malik, Segun Jung, Showe, Louise C., and Showe, Michael K.
- Subjects
- *
MACHINE learning , *RNA , *NUCLEOTIDE sequence , *GENETICS , *MATHEMATICAL models , *NUCLEIC acid probes , *COMPUTATIONAL biology - Abstract
Background: The application of machine learning to classification problems that depend only on positive examples is gaining attention in the computational biology community. We and others have described the use of two-class machine learning to identify novel miRNAs. These methods require the generation of an artificial negative class. However, designation of the negative class can be problematic and if it is not properly done can affect the performance of the classifier dramatically and/or yield a biased estimate of performance. We present a study using one-class machine learning for microRNA (miRNA) discovery and compare one-class to two-class approaches using naïve Bayes and Support Vector Machines. These results are compared to published two-class miRNA prediction approaches. We also examine the ability of the one-class and two-class techniques to identify miRNAs in newly sequenced species. Results: Of all methods tested, we found that 2-class naive Bayes and Support Vector Machines gave the best accuracy using our selected features and optimally chosen negative examples. One class methods showed average accuracies of 70-80% versus 90% for the two 2-class methods on the same feature sets. However, some one-class methods outperform some recently published two-class approaches with different selected features. Using the EBV genome as and external validation of the method we found one-class machine learning to work as well as or better than two-class approach in identifying true miRNAs as well as predicting new miRNAs. Conclusion: One and two class methods can both give useful classification accuracies when the negative class is well characterized. The advantage of one class methods is that it eliminates guessing at the optimal features for the negative class when they are not well defined. In these cases one-class methods can be superior to two-class methods when the features which are chosen as representative of that positive class are well defined. Availability: The OneClassmiRNA program is available at: [1] [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
14. Naive Bayes for microRNA target predictions machine learning for microRNA targets.
- Author
-
Malik Yousef, Segun Jung, Andrew V. Kossenkov, Louise C. Showe, and Michael K. Showe
- Subjects
- *
MESSENGER RNA , *GENETICS , *NUCLEOTIDE sequence , *ALGORITHMS - Abstract
Motivation: Most computational methodologies for miRNA:mRNA target gene prediction use the seed segment of the miRNA and require cross-species sequence conservation in this region of the mRNA target. Methods that do not rely on conservation generate numbers of predictions, which are too large to validate. We describe a target prediction method (NBmiRTar) that does not require sequence conservation, using instead, machine learning by a naïve Bayes classifier. It generates a model from sequence and miRNA:mRNA duplex information from validated targets and artificially generated negative examples. Both the âseedâ and âout-seedâ segments of the miRNA:mRNA duplex are used for target identification. Results: The application of machine-learning techniques to the features we have used is a useful and general approach for microRNA target gene prediction. Our technique produces fewer false positive predictions and fewer target candidates to be tested. It exhibits higher sensitivity and specificity than algorithms that rely on conserved genomic regions to decrease false positive predictions. Availability: The NBmiRTar program is available at http://wotan.wistar.upenn.edu/NBmiRTar/ Contact: yousef@wistar.org Supplementary information: http://wotan.wistar.upenn.edu/NBmiRTar/ [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.