Back to Search Start Over

Aggregation Tool for Genomic Concepts (ATGC): A deep learning framework for sparse genomic measures and its application to somatic mutations

Authors :
Alexander S. Baras
Craig Cummings
John-William Sidhom
Jordan Anaya
Publication Year :
2020
Publisher :
Cold Spring Harbor Laboratory, 2020.

Abstract

Deep learning has the ability to extract meaningful features from data given enough training examples. Large scale genomic data are well suited for this class of machine learning algorithms; however, for many of these data the labels are at the level of the sample instead of at the level of the individual genomic measures and features. To leverage the power of deep learning for these types of data we turn to a multiple instance learning framework, and present an easily extensible tool built with TensorFlow and Keras. We show how this tool can be applied to somatic variants (featurizing genomic position, sequence context, and read counts) on a range of artificial tasks (classification, regression, Cox regression). In addition, we confirm the model can achieve high performance on real-world problems, accurately classifying samples according to whether they contain a specific variant (hotspot or tumor suppressor), groups of variants (tumor clonality), or a type of variant (microsatellite instability). Our results suggest this framework will lead to improvements on sample-level tasks that require aggregation of a set of genomic measures.

Details

Database :
OpenAIRE
Accession number :
edsair.doi...........8096458d22a5f604ecbef6f51f72cf07