Back to Search Start Over

CAMIL: Clustering and Assembly with Multiple Instance Learning for phenotype prediction

Authors :
Nathan LaPierre
Huzefa Rangwala
Mohammad Arifur Rahman
Source :
BIBM
Publication Year :
2016
Publisher :
IEEE, 2016.

Abstract

The recent advent of Metagenome-Wide Association Studies (MGWAS) has allowed for increased accuracy in the prediction of patient phenotype (disease), but has also presented big data challenges. Meanwhile, Multiple Instance Learning (MIL) is useful in the domain of bioinformatics because, in addition to classifying patient phenotype, it can also identify individual parts of the microbiome that are indicative of that phenotype, leading to better understanding of the disease. We demonstrate a novel, efficient, and effective MIL-based computational pipeline to predict patient phenotype from MGWAS data. Specifically, we use a Bag of Words method, which has been shown to be one of the most effective and efficient MIL methods. This involves assembly of the metagenomic sequence data, clustering of the assembled contigs, extracting features from the contigs, and using an SVM classifier to predict patient labels and identify the most relevant read clusters. With the exception of the given labels for the patients, this entire process is de novo (unsupervised). We use data from a well-known MGWAS study of patients with Type-2 Diabetes and show that our pipeline significantly outperforms the classifier used in that paper, as well as other common MIL methods. We call our pipeline “CAMIL”, which stands for Clustering and Assembly with Multiple Instance Learning.

Details

Database :
OpenAIRE
Journal :
2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Accession number :
edsair.doi...........c019c97953dc21f23645d08e28524edd
Full Text :
https://doi.org/10.1109/bibm.2016.7822489