Author: "Rozana Rosli" / Journal: bmc bioinformatics - Searchworks@Jio Institute Digital Library Search Results

1. Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data

Author: Michael Hogan, Kuang-Lim Chan, Rozana Rosli, Tatiana V. Tatarinova, Mohd Firdaus-Raih, and Eng-Ti Leslie Low
Subjects: 0301 basic medicine, Gene prediction, Pipeline (computing), Arabidopsis, Biology, computer.software_genre, Biochemistry, Genome, 03 medical and health sciences, Structural Biology, Hidden Markov model, Molecular Biology, Gene, Applied Mathematics, Gene Expression Profiling, Oryza, Genome project, Exons, Genomics, Markov Chains, Computer Science Applications, Pipeline transport, 030104 developmental biology, Data mining, DNA microarray, Transcriptome, computer, Genome, Plant, Software
Abstract: Gene prediction is one of the most important steps in the genome annotation process. A large number of software tools and pipelines developed by various computing techniques are available for gene prediction. However, these systems have yet to accurately predict all or even most of the protein-coding regions. Furthermore, none of the currently available gene-finders has a universal Hidden Markov Model (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion. We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO’s plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure). Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.
Published: 2017

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

1 results on '"Rozana Rosli"'

1. Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Language

Database

1 results on '"Rozana Rosli"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources