Start Over

CAMIL: Clustering and Assembly with Multiple Instance Learning for phenotype prediction

Authors :: Nathan LaPierre
Huzefa Rangwala
Mohammad Arifur Rahman
Source :: BIBM
Publication Year :: 2016
Publisher :: IEEE, 2016.
Abstract: The recent advent of Metagenome-Wide Association Studies (MGWAS) has allowed for increased accuracy in the prediction of patient phenotype (disease), but has also presented big data challenges. Meanwhile, Multiple Instance Learning (MIL) is useful in the domain of bioinformatics because, in addition to classifying patient phenotype, it can also identify individual parts of the microbiome that are indicative of that phenotype, leading to better understanding of the disease. We demonstrate a novel, efficient, and effective MIL-based computational pipeline to predict patient phenotype from MGWAS data. Specifically, we use a Bag of Words method, which has been shown to be one of the most effective and efficient MIL methods. This involves assembly of the metagenomic sequence data, clustering of the assembled contigs, extracting features from the contigs, and using an SVM classifier to predict patient labels and identify the most relevant read clusters. With the exception of the given labels for the patients, this entire process is de novo (unsupervised). We use data from a well-known MGWAS study of patients with Type-2 Diabetes and show that our pipeline significantly outperforms the classifier used in that paper, as well as other common MIL methods. We call our pipeline “CAMIL”, which stands for Clustering and Assembly with Multiple Instance Learning.

Subjects :: 0301 basic medicine
business.industry
Computer science
Big data
Feature extraction
Genomics
Machine learning
computer.software_genre
Support vector machine
03 medical and health sciences
030104 developmental biology
Metagenomics
Bag-of-words model
Artificial intelligence
business
Cluster analysis
computer
Classifier (UML)

Details

Database :: OpenAIRE
Journal :: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Accession number :: edsair.doi...........c019c97953dc21f23645d08e28524edd
Full Text :: https://doi.org/10.1109/bibm.2016.7822489

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

CAMIL: Clustering and Assembly with Multiple Instance Learning for phenotype prediction

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

CAMIL: Clustering and Assembly with Multiple Instance Learning for phenotype prediction

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources