Back to Search
Start Over
CAMIL: Clustering and Assembly with Multiple Instance Learning for phenotype prediction
- Source :
- BIBM
- Publication Year :
- 2016
- Publisher :
- IEEE, 2016.
-
Abstract
- The recent advent of Metagenome-Wide Association Studies (MGWAS) has allowed for increased accuracy in the prediction of patient phenotype (disease), but has also presented big data challenges. Meanwhile, Multiple Instance Learning (MIL) is useful in the domain of bioinformatics because, in addition to classifying patient phenotype, it can also identify individual parts of the microbiome that are indicative of that phenotype, leading to better understanding of the disease. We demonstrate a novel, efficient, and effective MIL-based computational pipeline to predict patient phenotype from MGWAS data. Specifically, we use a Bag of Words method, which has been shown to be one of the most effective and efficient MIL methods. This involves assembly of the metagenomic sequence data, clustering of the assembled contigs, extracting features from the contigs, and using an SVM classifier to predict patient labels and identify the most relevant read clusters. With the exception of the given labels for the patients, this entire process is de novo (unsupervised). We use data from a well-known MGWAS study of patients with Type-2 Diabetes and show that our pipeline significantly outperforms the classifier used in that paper, as well as other common MIL methods. We call our pipeline “CAMIL”, which stands for Clustering and Assembly with Multiple Instance Learning.
- Subjects :
- 0301 basic medicine
business.industry
Computer science
Big data
Feature extraction
Genomics
Machine learning
computer.software_genre
Support vector machine
03 medical and health sciences
030104 developmental biology
Metagenomics
Bag-of-words model
Artificial intelligence
business
Cluster analysis
computer
Classifier (UML)
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
- Accession number :
- edsair.doi...........c019c97953dc21f23645d08e28524edd
- Full Text :
- https://doi.org/10.1109/bibm.2016.7822489