Back to Search Start Over

Fast and Efficient XML Data Access for Next-Generation Mass Spectrometry.

Authors :
Röst, Hannes L.
Schmitt, Uwe
Aebersold, Ruedi
Malmström, Lars
Source :
PLoS ONE. Apr2015, Vol. 10 Issue 4, p1-11. 11p.
Publication Year :
2015

Abstract

Motivation: In mass spectrometry-based proteomics, XML formats such as mzML and mzXML provide an open and standardized way to store and exchange the raw data (spectra and chromatograms) of mass spectrometric experiments. These file formats are being used by a multitude of open-source and cross-platform tools which allow the proteomics community to access algorithms in a vendor-independent fashion and perform transparent and reproducible data analysis. Recent improvements in mass spectrometry instrumentation have increased the data size produced in a single LC-MS/MS measurement and put substantial strain on open-source tools, particularly those that are not equipped to deal with XML data files that reach dozens of gigabytes in size. Results: Here we present a fast and versatile parsing library for mass spectrometric XML formats available in C++ and Python, based on the mature OpenMS software framework. Our library implements an API for obtaining spectra and chromatograms under memory constraints using random access or sequential access functions, allowing users to process datasets that are much larger than system memory. For fast access to the raw data structures, small XML files can also be completely loaded into memory. In addition, we have improved the parsing speed of the core mzML module by over 4-fold (compared to OpenMS 1.11), making our library suitable for a wide variety of algorithms that need fast access to dozens of gigabytes of raw mass spectrometric data. Availability: Our C++ and Python implementations are available for the Linux, Mac, and Windows operating systems. All proposed modifications to the OpenMS code have been merged into the OpenMS mainline codebase and are available to the community at . [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
19326203
Volume :
10
Issue :
4
Database :
Academic Search Index
Journal :
PLoS ONE
Publication Type :
Academic Journal
Accession number :
102402405
Full Text :
https://doi.org/10.1371/journal.pone.0125108