1. smORFer: a modular algorithm to detect small ORFs in prokaryotes
- Author
-
Ingrid Goebel, Alexander Bartholomäus, Stephan Fuchs, Zoya Ignatova, Ayten Mustafayeva, Susanne Engelmann, Dirk Benndorf, Baban Kolte, and HZI,Helmholtz-Zentrum für Infektionsforschung GmbH, Inhoffenstr. 7,38124 Braunschweig, Germany.
- Subjects
AcademicSubjects/SCI00010 ,Computer science ,Computational biology ,Biology ,Ribosome ,Genome ,Deep sequencing ,Open Reading Frames ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Genetics ,Feature (machine learning) ,Ribosome profiling ,ORFS ,Narese/7 ,030304 developmental biology ,chemistry.chemical_classification ,Messenger RNA ,0303 health sciences ,Computational Biology ,Eukaryota ,Molecular Sequence Annotation ,Translation (biology) ,Genome project ,Amino acid ,Narese/27 ,Open reading frame ,Narese/24 ,Prokaryotic Cells ,chemistry ,Protein Biosynthesis ,Methods Online ,Ribosomes ,Algorithms ,030217 neurology & neurosurgery - Abstract
Emerging evidence places small proteins (≤ 50 amino acids) more centrally in physiological processes. Yet, the identification of functional small proteins and the systematic genome annotation of their cognate small open reading frames (smORFs) remains challenging both experimentally and computationally. Ribosome profiling or Ribo-Seq (that is a deep sequencing of ribosome-protected fragments) enables detecting of actively translated open-reading frames (ORFs) and empirical annotation of coding sequences (CDSs) using the in-register translation pattern that is characteristic for genuinely translating ribosomes. Multiple identifiers of ORFs that use 3-nt periodicity in Ribo-Seq data sets have been successful in eukaryotic smORF annotation. Yet, they have difficulties evaluating prokaryotic genomes due to the unique architecture of prokaryotic genomes (e.g. polycistronic messages, overlapping ORFs, leaderless translation, non-canonical initiation etc.). Here, we present our new algorithm, smORFer, which performs with high accuracy in prokaryotic organisms in detecting smORFs. The unique feature of smORFer is that it uses integrated approach and considers structural features of the genetic sequence along with in-register translation and uses Fourier transform to convert these parameters into a measurable score to faithfully select smORFs. The algorithm is executed in a modular way and dependent on the data available for a particular organism allows using different modules for smORF search.
- Published
- 2021