Back to Search Start Over

Bastion3: a two-layer ensemble predictor of type III secreted effectors

Authors :
Tatiana T. Marquez-Lago
Jiawei Wang
Trevor Lithgow
Kuo-Chen Chou
Joel Selkrig
Jiahui Li
Jiangning Song
André Leier
Tatsuya Akutsu
Yanju Zhang
Bingjiao Yang
Tieli Zhou
Ruopeng Xie
Morihiro Hayashida
Source :
Bioinformatics
Publication Year :
2018

Abstract

Motivation Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen–host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-terminus (or incorporating also the C-terminus) instead of the proteins’ complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model. Results In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models’ performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-the-art toolkit for T3SE prediction. Availability and implementation http://bastion3.erc.monash.edu/ Contact selkrig@embl.de or wyztli@163.com or or trevor.lithgow@monash.edu Supplementary information Supplementary data are available at Bioinformatics online.

Details

ISSN :
13674811
Volume :
35
Issue :
12
Database :
OpenAIRE
Journal :
Bioinformatics (Oxford, England)
Accession number :
edsair.doi.dedup.....e35cfe42348241db73c6d76a8dccc446