1. Evolutionary modelling of biological structures
- Author
-
Golden, Michael and Hein, Jotun
- Subjects
572 - Abstract
Many naturally occurring biological molecules, such as nucleic acids or proteins, are evolutionary related, having evolved from a common ancestor via a process of sequence mutation. Mutations undergo a process of selection, with combinations of mutations that preserve or improve a molecule's fitness having a higher probability of being maintained. Mutations that modify structure are expected to particularly impact fitness, and often detectably influence observed patterns of mutations. Nucleotide coevolution is one such example, whereby pairs of nucleotides within biologically functional nucleic acid secondary structures exhibit evidence of coevolution that is consistent with the maintenance of canonical base-pairing. PINNACLE, the first of two models developed in this dissertation, is a sequence evolution model that assesses the rates of mutation associated with base-paired sites in alignments of DNA or RNA sequences. PINNACLE is able to fully account for an unknown secondary structure, and in doing so can be used to predict a secondary structure shared amongst an alignment of sequences. PINNACLE was used to infer rates of coevolution associated with GC, AU (AT in DNA), and GU (GT in DNA) dinucleotides in non-coding RNA alignments, and single-stranded RNA and DNA virus alignments. Strong evidence was found for GU dinucleotides being selectively favoured at base-paired sites in non-coding RNA and RNA virus alignments, with slight evidence for GT dinucleotides being selectively favoured at base-paired sites in DNA virus alignments. The strength of coevolution at base-paired sites in a SHAPE-MaP-determined HIV-1 NL4-3 RNA secondary structure and a corresponding alignment containing large numbers of HIV group 1M sequences was also measured, finding statistically significant correlations between the experimentally-determined SHAPE-MaP pairing scores and the inferred degrees of coevolution. The second model developed in this dissertation, ETDBN, differs from typical evolutionary models in that it models the structural evolution of proteins in addition to their sequence evolution. ETDBN uses a dihedral angle representation of protein structure, and models the evolutionary trajectory between a pair of protein structures using an angular diffusion process on the two-dimensional torus. ETDBN is trained on a large database of proteins, and is parameterised such that it can learn the dependencies between sequence and structural evolution. The model has interpretable parameters, and is comparatively more realistic than previous stochastic models of protein structure evolution. Using the trained model, we were able to identify apparent sequence-structure evolutionary motifs present in numerous homologous protein pairs. The generative nature of our model enabled us to evaluate its validity and to infer various quantities, such as protein structures or alignments from sequences of amino acids, dihedral angles, secondary structure labels, or any combination thereof.
- Published
- 2017