Back to Search
Start Over
PepLine: A Software Pipeline for High-Throughput Direct Mapping of Tandem Mass Spectrometry Data on Genomic Sequences
- Source :
- Journal of Proteome Research, Journal of Proteome Research, 2008, 7 (5), pp.1873-1883. ⟨10.1021/pr070415k⟩, Journal of Proteome Research, American Chemical Society, 2008, 7 (5), pp.1873-1883. ⟨10.1021/pr070415k⟩
- Publication Year :
- 2008
- Publisher :
- HAL CCSD, 2008.
-
Abstract
- PepLine is a fully automated software which maps MS/MS fragmentation spectra of trypsic peptides to genomic DNA sequences. The approach is based on Peptide Sequence Tags (PSTs) obtained from partial interpretation of QTOF MS/MS spectra (first module). PSTs are then mapped on the six-frame translations of genomic sequences (second module) giving hits. Hits are then clustered to detect potential coding regions (third module). Our work aimed at optimizing the algorithms of each component to allow the whole pipeline to proceed in a fully automated manner using raw nucleic acid sequences (i.e., genomes that have not been "reduced" to a database of ORFs or putative exons sequences). The whole pipeline was tested on controlled MS/MS spectra sets from standard proteins and from Arabidopsis thaliana envelope chloroplast samples. Our results demonstrate that PepLine competed with protein database searching softwares and was fast enough to potentially tackle large data sets and/or high size genomes. We also illustrate the potential of this approach for the detection of the intron/exon structure of genes.
- Subjects :
- Chloroplasts
[SDV]Life Sciences [q-bio]
Molecular Sequence Data
[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]
Arabidopsis
Computational biology
Biology
Tandem mass spectrometry
Proteomics
01 natural sciences
Biochemistry
Genome
Mass Spectrometry
03 medical and health sciences
[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN]
Animals
Coding region
[SDV.BBM]Life Sciences [q-bio]/Biochemistry, Molecular Biology
Amino Acid Sequence
ORFS
ComputingMilieux_MISCELLANEOUS
030304 developmental biology
Genetics
0303 health sciences
Base Sequence
Arabidopsis Proteins
010401 analytical chemistry
Peptide sequence tag
General Chemistry
Genome project
0104 chemical sciences
genomic DNA
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
Peptides
Sequence Alignment
Algorithms
Software
Subjects
Details
- Language :
- English
- ISSN :
- 15353893 and 15353907
- Database :
- OpenAIRE
- Journal :
- Journal of Proteome Research, Journal of Proteome Research, 2008, 7 (5), pp.1873-1883. ⟨10.1021/pr070415k⟩, Journal of Proteome Research, American Chemical Society, 2008, 7 (5), pp.1873-1883. ⟨10.1021/pr070415k⟩
- Accession number :
- edsair.doi.dedup.....85f34b80802dac6845792564d6a1e27e