Back to Search
Start Over
Database-independent molecular formula annotation using Gibbs sampling through ZODIAC
- Source :
- Nature Machine Intelligence. 2:629-641
- Publication Year :
- 2020
- Publisher :
- Springer Science and Business Media LLC, 2020.
-
Abstract
- The confident high-throughput identification of small molecules is one of the most challenging tasks in mass spectrometry-based metabolomics. Annotating the molecular formula of a compound is the first step towards its structural elucidation. Yet even the annotation of molecular formulas remains highly challenging. This is particularly so for large compounds above 500 daltons, and for de novo annotations, for which we consider all chemically feasible formulas. Here we present ZODIAC, a network-based algorithm for the de novo annotation of molecular formulas. Uniquely, it enables fully automated and swift processing of complete experimental runs, providing high-quality, high-confidence molecular formula annotations. This allows us to annotate novel molecular formulas that are absent from even the largest public structure databases. Our method re-ranks molecular formula candidates by considering joint fragments and losses between fragmentation trees. We employ Bayesian statistics and Gibbs sampling. Thorough algorithm engineering ensures fast processing in practice. We evaluate ZODIAC on five datasets, producing results substantially (up to 16.5-fold) better than for several other methods, including SIRIUS, which is the state-of-the-art algorithm for molecular formula annotation at present. Finally, we report and verify several novel molecular formulas annotated by ZODIAC. To infer a previously unknown molecular formula from mass spectrometry data is a challenging, yet neglected problem. Ludwig and colleagues present a network-based approach to ranking possible formulas.
- Subjects :
- 0301 basic medicine
Structure (mathematical logic)
Database
Computer Networks and Communications
Computer science
Algorithm engineering
computer.software_genre
Ranking (information retrieval)
Human-Computer Interaction
Bayesian statistics
03 medical and health sciences
Identification (information)
Annotation
symbols.namesake
030104 developmental biology
0302 clinical medicine
Artificial Intelligence
Zodiac
symbols
Computer Vision and Pattern Recognition
computer
030217 neurology & neurosurgery
Software
Gibbs sampling
Subjects
Details
- ISSN :
- 25225839
- Volume :
- 2
- Database :
- OpenAIRE
- Journal :
- Nature Machine Intelligence
- Accession number :
- edsair.doi...........982a74b64450bec4a629b2701d46b024
- Full Text :
- https://doi.org/10.1038/s42256-020-00234-6