The study of human disease has advanced tremendously in the past two decades. While bacteria and viruses account for the majority of infectious conditions humans can contract, mutations in our genome can also result in disease states. Our increased access to genetic sequencing data has shown us that only a small fraction of mutations observed in humans result in disease, but identifying them in a background of benign variation is a daunting task. There is often insufficient evidence to confidently classify a particular mutation as pathogenic or benign and consequently, variants of unknown clinical significance are frequently identified. Apart from waiting for more sequencing evidence to accumulate, there are two general approaches to elucidating the clinical consequences of unknown variants. The first is to experimentally compare the activity of the mutant protein to the wild type, which can be both expensive and time-consuming. The second is to predict the outcome of the mutation on protein function using a computer programme known as a variant effect predictor (VEP). VEPs are rapid and freely available, but their performance levels are often inconsistent across proteins. In the first chapter, I review the methodology behind benchmarking VEPs. With many dozens of VEPs available, it is often unclear which produce the best performance. Benchmarking studies often disagree markedly, and rarely come to a consensus regarding the best VEPs. Furthermore, internal benchmarks performed by the authors of new predictors almost exclusively find their own method to be superior to all previous predictors assessed. The most commonly employed benchmarking strategies involve testing predictors against large datasets of known pathogenic and benign variants, which may overlap the training data of supervised VEPs. This leads to data circularity, resulting in unfairly inflated performance estimates for some predictors. An alternative to traditional benchmarking methods is the use of data derived from large-scale functional assays such as deep mutational scanning (DMS). Such datasets are fully independent of previous variant classifications and far less prone to issues arising from circularity. I discuss some of the key advances in VEP methodology, current benchmarking strategies, and how well functional assays can overcome the issues of data circularity. I also discuss the ability of functional assays to directly predict the clinical impact of variants. In the second chapter, I put some of the improved benchmarking methods discussed in the first chapter into practice. Using a diverse dataset composed of 31 previously published DMS experiments; I benchmarked 46 VEPs that were based on varying methodologies. DeepSequence, an unsupervised VEP, stood out among all predictors assessed as having the strongest correlation with functional assays in human proteins. I also assessed the ability of the DMS data and VEPs to directly predict clinical effects. DMS experiments tended to be superior to computational predictors for this purpose, although among the VEPs, DeepSequence once again produced the top performance. An update to these results is presented in the third chapter including additional human DMS datasets, more recent predictors and an updated benchmarking methodology. In this update, I found that a new predictor, VARITY outperformed DeepSequence and that several newer methods including EVE and MetaRNN also produced good results. In the fourth chapter, I used a statistical approach to investigate the nature of disease variants that occur at protein interfaces. I identified several features that may prove useful to future VEPs. By dividing protein interfaces into homomeric (isologous and heterologous), heteromeric, DNA, RNA and other ligands, I showed that different interface types vary greatly in their propensity to be associated with pathogenic mutations. Variants in heterologous and DNA interfaces were particularly enriched in disease. I also showed that residues that do not directly participate in the interface, but are close in 3D space also show a significant disease enrichment. Mutations at different interface types also tended to have distinct property changes associated with them when undergoing amino acid substitutions associated with disease. VEP predictions of pathogenicity varied markedly by interface type and protein region, those features that distinguish each region may make useful features for future prediction methods. In the fifth chapter, I present some of the indirect research outputs of this project, primarily in the form of contributions to published work by obtaining and analysing VEP output. These works included an analysis of missense variants resulting in severe phenotype in the PAX6 gene, providing evidence for a pathogenic substitution in USH2A, developing an alternative technique for identifying clonal haematopoiesis, producing a model of multimer and fibril evolution and, identifying pathogenic mechanisms that are poorly predicted by current VEP technology. The sixth and final chapter reiterates that key findings of this project and speculates about possible future research directions. This project included the largest and most inclusive DMS-based VEP benchmarking study to date and demonstrated the feasibility and utility of such approaches moving forwards. Our analysis of protein interfaces also highlighted several aspects, such as the relationship between interface proximity and disease enrichment that may prove valuable to variant analysis or VEP development in the future.