9 results on '"Alex Hawkins-Hooker"'
Search Results
2. Getting personal with epigenetics: towards individual-specific epigenomic imputation with machine learning
- Author
-
Alex Hawkins-Hooker, Giovanni Visonà, Tanmayee Narendra, Mateo Rojas-Carulla, Bernhard Schölkopf, and Gabriele Schweikert
- Subjects
Science - Abstract
Abstract Epigenetic modifications are dynamic mechanisms involved in the regulation of gene expression. Unlike the DNA sequence, epigenetic patterns vary not only between individuals, but also between different cell types within an individual. Environmental factors, somatic mutations and ageing contribute to epigenetic changes that may constitute early hallmarks or causal factors of disease. Epigenetic modifications are reversible and thus promising therapeutic targets for precision medicine. However, mapping efforts to determine an individual’s cell-type-specific epigenome are constrained by experimental costs and tissue accessibility. To address these challenges, we developed eDICE, an attention-based deep learning model that is trained to impute missing epigenomic tracks by conditioning on observed tracks. Using a recently published set of epigenomes from four individual donors, we show that transfer learning across individuals allows eDICE to successfully predict individual-specific epigenetic variation even in tissues that are unmapped in a given donor. These results highlight the potential of machine learning-based imputation methods to advance personalized epigenomics.
- Published
- 2023
- Full Text
- View/download PDF
3. The ENCODE Imputation Challenge: a critical assessment of methods for cross-cell type imputation of epigenomic profiles
- Author
-
Jacob Matthew Schreiber, Carles A. Boix, Jin wook Lee, Hongyang Li, Yuanfang Guan, Chun-Chieh Chang, Jen-Chien Chang, Alex Hawkins-Hooker, Bernhard Schölkopf, Gabriele Schweikert, Mateo Rojas Carulla, Arif Canakoglu, Francesco Guzzo, Luca Nanni, Marco Masseroli, Mark James Carman, Pietro Pinoli, Chenyang Hong, Kevin Y. Yip, Jefrey P. Spence, Sanjit Singh Batra, Yun S. Song, Shaun Mahony, Zheng Zhang, Wuwei Tan, Yang Shen, Yuanfei Sun, Minyi Shi, Jessika Adrian, Richard S. Sandstrom, Nina P. Farrell, Jessica M. Halow, Kristen Lee, Lixia Jiang, Xinqiong Yang, Charles B. Epstein, J. Seth Strattan, Bradley E. Bernstein, Michael P. Snyder, Manolis Kellis, William S. Noble, Anshul Bharat Kundaje, and ENCODE Imputation Challenge Participants
- Subjects
Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research.
- Published
- 2023
- Full Text
- View/download PDF
4. Generating functional protein variants with variational autoencoders.
- Author
-
Alex Hawkins-Hooker, Florence Depardieu, Sebastien Baur, Guillaume Couairon, Arthur Chen, and David Bikard
- Subjects
Biology (General) ,QH301-705.5 - Abstract
The vast expansion of protein sequence databases provides an opportunity for new protein design approaches which seek to learn the sequence-function relationship directly from natural sequence variation. Deep generative models trained on protein sequence data have been shown to learn biologically meaningful representations helpful for a variety of downstream tasks, but their potential for direct use in the design of novel proteins remains largely unexplored. Here we show that variational autoencoders trained on a dataset of almost 70000 luciferase-like oxidoreductases can be used to generate novel, functional variants of the luxA bacterial luciferase. We propose separate VAE models to work with aligned sequence input (MSA VAE) and raw sequence input (AR-VAE), and offer evidence that while both are able to reproduce patterns of amino acid usage characteristic of the family, the MSA VAE is better able to capture long-distance dependencies reflecting the influence of 3D structure. To confirm the practical utility of the models, we used them to generate variants of luxA whose luminescence activity was validated experimentally. We further showed that conditional variants of both models could be used to increase the solubility of luxA without disrupting function. Altogether 6/12 of the variants generated using the unconditional AR-VAE and 9/11 generated using the unconditional MSA VAE retained measurable luminescence, together with all 23 of the less distant variants generated by conditional versions of the models; the most distant functional variant contained 35 differences relative to the nearest training set sequence. These results demonstrate the feasibility of using deep generative models to explore the space of possible protein sequences and generate useful variants, providing a method complementary to rational design and directed evolution approaches.
- Published
- 2021
- Full Text
- View/download PDF
5. Moment Matching Denoising Gibbs Sampling.
- Author
-
Mingtian Zhang, Alex Hawkins-Hooker, Brooks Paige, and David Barber
- Published
- 2023
6. Projection layers improve deep learning models of regulatory DNA function [version 1; peer review: 1 approved, 1 approved with reservations]
- Author
-
Alex Hawkins-Hooker, Henry Kenlay, and John E. Reid
- Subjects
Method Article ,Articles ,sequence analysis ,deep learning ,gene regulation - Abstract
With the increasing application of deep learning methods to the modelling of regulatory DNA sequences has come an interest in exploring what types of architecture are best suited to the domain. Networks designed to predict many functional characteristics of noncoding DNA in a multitask framework have to recognise a large number of motifs and as a result benefit from large numbers of convolutional filters in the first layer. The use of large first layers in turn motivates an exploration of strategies for addressing the sparsity of output and possibility for overfitting that result. To this end we propose the use of a dimensionality-reducing linear projection layer after the initial motif-recognising convolutions. In experiments with a reduced version of the DeepSEA dataset we find that inserting this layer in combination with dropout into convolutional and convolutional-recurrent architectures can improve predictive performance across a range of first layer sizes. We further validate our approach by incorporating the projection layer into a new convolutional-recurrent architecture which achieves state of the art performance on the full DeepSEA dataset. Analysis of the learned projection weights shows that the inclusion of this layer simplifies the network’s internal representation of the occurrence of motifs, notably by projecting features representing forward and reverse-complement motifs to similar positions in the lower dimensional feature space output by the layer.
- Published
- 2019
- Full Text
- View/download PDF
7. Getting Personal with Epigenetics: Towards Machine-Learning-Assisted Precision Epigenomics
- Author
-
Alex Hawkins-Hooker, Giovanni Visona, Tanmayee Narendra, Mateo Rojas-Carulla, Bernhard Schölkopf, and Gabriele Schweikert
- Abstract
Epigenetic modifications are dynamic control mechanisms involved in the regulation of gene expression. Unlike the DNA sequence itself, they vary not only between individuals but also between different cell types of the same individual. Exposure to environmental factors, somatic mutations, and ageing contribute to epigenomic changes over time, which may constitute early hallmarks or causal factors of disease. Epigenetic changes are reversible and, therefore, promising therapeutic targets. However, mapping efforts to determine an individual’s cell-type-specific epigenome are constrained by experimental costs. We developed eDICE, an attention-based deep learning model, to impute epigenomic tracks. eDICE achieves improved overall performance compared to previous models on the reference Roadmap epigenomes. Furthermore, we present a proof of concept for the imputation of personalised epigenomic measurements on the ENTEx dataset, where eDICE correctly predicts individual- and cell-type-specific epigenetic patterns. This case study constitutes an important step towards robustly employing machine-learning-based approaches for personalised epigenomics.
- Published
- 2022
- Full Text
- View/download PDF
8. Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay
- Author
-
Ron Unger, Jay Shendure, Ayoti Patra, Beth Martin, Henry Kenlay, Zhongxia Yan, Anat Kreimer, Michael A. Beer, Nir Yosef, Dmitry Penzar, Martin Kircher, Max Schubach, Tamar Juven-Gershon, John E. Reid, Alan P. Boyle, Alex Hawkins-Hooker, Aashish N. Adhikari, Orit Adato, Nadav Ahituv, Ivan V. Kulakovskiy, Fumitaka Inoue, Chenling Xiong, Shengcheng Dong, and Dustin Shigaki
- Subjects
Epigenomics ,Enhancer Elements ,Base pair ,Clinical Sciences ,promoters ,Computational biology ,Biology ,Article ,Cell Line ,Promoter Regions ,Machine Learning ,03 medical and health sciences ,Genetic ,Genetics ,Humans ,Point Mutation ,2.1 Biological and endogenous factors ,Genetic Predisposition to Disease ,Aetiology ,Promoter Regions, Genetic ,Saturated mutagenesis ,Enhancer ,Genetics (clinical) ,030304 developmental biology ,Genetics & Heredity ,0303 health sciences ,Reporter gene ,Binding Sites ,regulatory variation ,030305 genetics & heredity ,Human Genome ,Promoter ,DNA ,MPRA ,Chromatin ,DNA binding site ,Enhancer Elements, Genetic ,enhancers ,Generic health relevance ,gene regulation ,Transcription Factors - Abstract
The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.
- Published
- 2019
9. Projection layers improve deep learning models of regulatory DNA function
- Author
-
John E. Reid, Alex Hawkins-Hooker, and Henry Kenlay
- Subjects
business.industry ,Computer science ,Feature vector ,Deep learning ,Artificial intelligence ,Function (mathematics) ,Overfitting ,Layer (object-oriented design) ,business ,Representation (mathematics) ,Algorithm ,Projection (linear algebra) ,Dropout (neural networks) - Abstract
With the increasing application of deep learning methods to the modelling of regulatory DNA sequences has come an interest in exploring what types of architecture are best suited to the domain. Networks designed to predict many functional characteristics of noncoding DNA in a multitask framework have to recognise a large number of motifs and as a result benefit from large numbers of convolutional filters in the first layer. The use of large first layers in turn motivates an exploration of strategies for addressing the sparsity of output and possibility for overfitting that result. To this end we propose the use of a dimensionality-reducing linear projection layer after the initial motif-recognising convolutions. In experiments with a reduced version of the DeepSEA dataset we find that inserting this layer in combination with dropout into convolutional and convolutional-recurrent architectures can improve predictive performance across a range of first layer sizes. We further validate our approach by incorporating the projection layer into a new convolutional-recurrent architecture which achieves state of the art performance on the full DeepSEA dataset. Analysis of the learned projection weights shows that the inclusion of this layer simplifies the network’s internal representation of the occurrence of motifs, notably by projecting features representing forward and reverse-complement motifs to similar positions in the lower dimensional feature space output by the layer.
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.