Back to Search
Start Over
Bastion3: a two-layer ensemble predictor of type III secreted effectors
- Source :
- Bioinformatics
- Publication Year :
- 2018
-
Abstract
- Motivation Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen–host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-terminus (or incorporating also the C-terminus) instead of the proteins’ complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model. Results In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models’ performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-the-art toolkit for T3SE prediction. Availability and implementation http://bastion3.erc.monash.edu/ Contact selkrig@embl.de or wyztli@163.com or or trevor.lithgow@monash.edu Supplementary information Supplementary data are available at Bioinformatics online.
- Subjects :
- Statistics and Probability
Gram-negative bacteria
Computer science
Value (computer science)
Machine learning
computer.software_genre
Biochemistry
Bacterial protein
Machine Learning
03 medical and health sciences
Protein sequencing
Bacterial Proteins
Genetic algorithm
Gram-Negative Bacteria
Secretion
Amino Acid Sequence
Molecular Biology
Peptide sequence
030304 developmental biology
0303 health sciences
biology
business.industry
030302 biochemistry & molecular biology
Computational Biology
biology.organism_classification
Ensemble learning
Original Papers
Computer Science Applications
Computational Mathematics
Identification (information)
Computational Theory and Mathematics
Host cell cytoplasm
Benchmark (computing)
Artificial intelligence
business
computer
Algorithms
Software
Subjects
Details
- ISSN :
- 13674811
- Volume :
- 35
- Issue :
- 12
- Database :
- OpenAIRE
- Journal :
- Bioinformatics (Oxford, England)
- Accession number :
- edsair.doi.dedup.....e35cfe42348241db73c6d76a8dccc446