Representation learning of genomic sequence motifs with convolutional neural networks.

Authors :: Koo PK
Eddy SR
Source :: PLoS computational biology [PLoS Comput Biol] 2019 Dec 19; Vol. 15 (12), pp. e1007560. Date of Electronic Publication: 2019 Dec 19 (Print Publication: 2019).
Publication Year :: 2019
Abstract: Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal how CNN architecture, specifically convolutional filter size and max-pooling, influences the extent that sequence motif representations are learned by first layer filters. We find that CNNs designed to foster hierarchical representation learning of sequence motifs-assembling partial features into whole features in deeper layers-tend to learn distributed representations, i.e. partial motifs. On the other hand, CNNs that are designed to limit the ability to hierarchically build sequence motif representations in deeper layers tend to learn more interpretable localist representations, i.e. whole motifs. We then validate that this representation learning principle established from synthetic sequences generalizes to in vivo sequences.<br />Competing Interests: The authors have declared that no competing interests exist.

Subjects :: Amino Acid Motifs
Binding Sites genetics
Computational Biology
Computer Simulation
DNA genetics
Databases, Genetic statistics & numerical data
Deep Learning statistics & numerical data
Genome, Human
Humans
Transcription Factors chemistry
Transcription Factors genetics
Transcription Factors metabolism
Genomics statistics & numerical data
Neural Networks, Computer

Full Text Access

Tools