Back to Search
Start Over
Spiked proteomic standard dataset for testing label-free quantitative software and statistical methods
- Source :
- Data in Brief, Data in Brief, Elsevier, 2016, 6, pp.286-294. ⟨10.1016/j.dib.2015.11.063⟩, Data in Brief, Vol 6, Iss, Pp 286-294 (2016)
- Publication Year :
- 2015
-
Abstract
- International audience; This data article describes a controlled, spiked proteomic dataset for which the “ground truth” of variant proteins is known. It is based on the LC-MS analysis of samples composed of a fixed background of yeast lysate and different spiked amounts of the UPS1 mixture of 48 recombinant proteins. It can be used to objectively evaluate bioinformatic pipelines for label-free quantitative analysis, and their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. More specifically, it can be useful for tuning software tools parameters, but also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. The raw MS files can be downloaded from ProteomeXchange with identifier http://www.ebi.ac.uk/pride/archive/projects/PXD001819. Starting from some raw files of this dataset, we also provide here some processed data obtained through various bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, to exemplify the use of such data in the context of software benchmarking, as discussed in details in the accompanying manuscript [1]. The experimental design used here for data processing takes advantage of the different spike levels introduced in the samples composing the dataset, and processed data are merged in a single file to facilitate the evaluation and illustration of software tools results for the detection of variant proteins with different absolute expression levels and fold change values.
- Subjects :
- 0301 basic medicine
False discovery rate
Data processing
Ground truth
Multidisciplinary
Computer science
business.industry
[SDV]Life Sciences [q-bio]
Context (language use)
lcsh:Computer applications to medicine. Medical informatics
computer.software_genre
Identifier
03 medical and health sciences
030104 developmental biology
Workflow
Software
ComputingMethodologies_PATTERNRECOGNITION
lcsh:R858-859.7
Spike (software development)
Data mining
lcsh:Science (General)
business
computer
lcsh:Q1-390
Data Article
Subjects
Details
- ISSN :
- 23523409
- Volume :
- 6
- Database :
- OpenAIRE
- Journal :
- Data in brief
- Accession number :
- edsair.doi.dedup.....e5401e07d0c8b69889bf871c8b94567d
- Full Text :
- https://doi.org/10.1016/j.dib.2015.11.063⟩