1. UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase
- Author
-
Elena Speretta, Marc Feuermann, Paul Denny, Yvonne Lussi, Antonia Lock, Alexandr Ignatchenko, Philippe Le Mercier, Dushyanth Jyothi, Alexandre Renaux, Ivo Pedruzzi, Emmanuel Boutet, Emanuele Alpi, Claire O'Donovan, Edward Turner, Sandra Orchard, Patrick Masson, Alex Bateman, Peter McGarvey, Emma Hatton-Ellis, Michele Magrane, Alan Bridge, Hema Bye-A-Jee, Ramona Britto, Giuseppe Insana, Shriya Raj, Maria-Jesus Martin, Catherine Rivoire, Penelope Garmiri, Emily Bowler-Barnett, Vishal Joshi, Kate Warner, and Faculty of Sciences and Bioengineering Sciences
- Subjects
InterPro ,Statistics and Probability ,Protein family ,AcademicSubjects/SCI01060 ,Computer science ,Knowledge Bases ,Databases and Ontologies ,Biochemistry ,DNA sequencing ,03 medical and health sciences ,Annotation ,Resource (project management) ,Databases, Protein ,Gene ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Information retrieval ,030302 biochemistry & molecular biology ,Chromosome Mapping ,Proteins ,Molecular Sequence Annotation ,Corrigenda ,Original Papers ,Computer Science Applications ,Computational Mathematics ,Functional annotation ,Computational Theory and Mathematics ,UniProt Knowledgebase ,UniProt - Abstract
Motivation The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. Results In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. Availability and implementation UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.
- Published
- 2020