41 results on '"Schubach M"'
Search Results
2. 8P - Diagnostic Next-Generation Sequencing Panel for Hereditary Breast and Ovarian Cancer
- Author
-
Menzel, M.M., Scheurenbrand, T., Sprecher, A., Schubach, M., Battke, F., and Biskup, S.
- Published
- 2013
- Full Text
- View/download PDF
3. Characterization of glycosylphosphatidylinositol biosynthesis defects by clinical features, flow cytometry, and automated image analysis
- Author
-
Knaus, A. (Alexej), Pantel, J.T. (Jean Tori), Pendziwiat, M. (Manuela), Hajjir, N. (Nurulhuda), Zhao, M. (Max), Hsieh, T.-C. (Tzung-Chien), Schubach, M. (Max), Gurovich, Y. (Yaron), Fleischer, N. (Nicole), Jäger, M. (Marten), Köhler, S. (Sebastian), Muhle, H. (Hiltrud), Korff, C. (Christian), Møller, R.S. (Rikke S.), Bayat, A. (Allan), Calvas, P. (Patrick), Chassaing, N. (Nicolas), Warren, H. (Hannah), Skinner, S. (Steven), Louie, R. (Raymond), Evers, C. (Christina), Bohn, M. (Marc), Christen, H.-J. (Hans-Jürgen), Born, M. (Myrthe) van den, Obersztyn, E. (Ewa), Charzewska, A. (Agnieszka), Endziniene, M. (Milda), Kortüm, F. (Fanny), Brown, N. (Natasha), Robinson, P.N. (Peter N.), Schelhaas, H.J. (Helenius), Weber, Y. (Yvonne), Helbig, I. (Ingo), Mundlos, S. (Stefan), Horn, D. (Denise), Krawitz, P., Knaus, A. (Alexej), Pantel, J.T. (Jean Tori), Pendziwiat, M. (Manuela), Hajjir, N. (Nurulhuda), Zhao, M. (Max), Hsieh, T.-C. (Tzung-Chien), Schubach, M. (Max), Gurovich, Y. (Yaron), Fleischer, N. (Nicole), Jäger, M. (Marten), Köhler, S. (Sebastian), Muhle, H. (Hiltrud), Korff, C. (Christian), Møller, R.S. (Rikke S.), Bayat, A. (Allan), Calvas, P. (Patrick), Chassaing, N. (Nicolas), Warren, H. (Hannah), Skinner, S. (Steven), Louie, R. (Raymond), Evers, C. (Christina), Bohn, M. (Marc), Christen, H.-J. (Hans-Jürgen), Born, M. (Myrthe) van den, Obersztyn, E. (Ewa), Charzewska, A. (Agnieszka), Endziniene, M. (Milda), Kortüm, F. (Fanny), Brown, N. (Natasha), Robinson, P.N. (Peter N.), Schelhaas, H.J. (Helenius), Weber, Y. (Yvonne), Helbig, I. (Ingo), Mundlos, S. (Stefan), Horn, D. (Denise), and Krawitz, P.
- Abstract
Background: Glycosylphosphatidylinositol biosynthesis defects (GPIBDs) cause a group of phenotypically overlapping recessive syndromes with intellectual disability, for which pathogenic mutations have been described in 16 genes of the corresponding molecular pathway. An elevated serum activity of alkaline phosphatase (AP), a GPI-linked enzyme, has been used to assign GPIBDs to the phenotypic series of hyperphosphatasia with mental retardation syndrome (HPMRS) and to distinguish them from another subset of GPIBDs, termed multiple congenital anomalies hypotonia seizures syndrome (MCAHS). However, the increasing number of individuals with a GPIBD shows that hyperphosphatasia is a variable feature that is not ideal for a clinical classification. Methods: We studied the discriminatory power of multiple GPI-linked substrates that were assessed by flow cytometry in blood cells and fibroblasts of 39 and 14 individuals with a GPIBD, respectively. On the phenotypic level, we evaluated the frequency of occurrence of clinical symptoms and analyzed the performance of computer-assisted image analysis of the facial gestalt in 91 individuals. Results: We found that certain malformations such as Morbus Hirschsprung and diaphragmatic defects are more likely to be associated with particular gene defects (PIGV, PGAP3, PIGN). However, especially at the severe end of the clinical spectrum of HPMRS, there is a high phenotypic overlap with MCAHS. Elevation of AP has also been documented in some of the individuals with MCAHS, namely those with PIGA mutations. Although the impairment of GPI-linked substrates is supposed to play the key role in the pathophysiology of GPIBDs, we could not observe gene-specific profiles for flow cytometric markers or a correlation between their cell surface levels and the severity of the phenotype. In contrast, it was facial recognition software that achieved the highest accuracy in predicting the disease-causing gene in a GPIBD. Conclusions: Due to the overlap
- Published
- 2018
- Full Text
- View/download PDF
4. Characterization of glycosylphosphatidylinositol biosynthesis defects by clinical features, flow cytometry, and automated image analysis
- Author
-
Knaus, A, Pantel, JT, Pendziwiat, M, Hajjir, N, Zhao, M, Hsieh, T-C, Schubach, M, Gurovich, Y, Fleischer, N, Jaeger, M, Koehler, S, Muhle, H, Korff, C, Moller, RS, Bayat, A, Calvas, P, Chassaing, N, Warren, H, Skinner, S, Louie, R, Evers, C, Bohn, M, Christen, H-J, van den Born, M, Obersztyn, E, Charzewska, A, Endziniene, M, Kortuem, F, Brown, N, Robinson, PN, Schelhaas, HJ, Weber, Y, Helbig, I, Mundlos, S, Horn, D, Krawitz, PM, Knaus, A, Pantel, JT, Pendziwiat, M, Hajjir, N, Zhao, M, Hsieh, T-C, Schubach, M, Gurovich, Y, Fleischer, N, Jaeger, M, Koehler, S, Muhle, H, Korff, C, Moller, RS, Bayat, A, Calvas, P, Chassaing, N, Warren, H, Skinner, S, Louie, R, Evers, C, Bohn, M, Christen, H-J, van den Born, M, Obersztyn, E, Charzewska, A, Endziniene, M, Kortuem, F, Brown, N, Robinson, PN, Schelhaas, HJ, Weber, Y, Helbig, I, Mundlos, S, Horn, D, and Krawitz, PM
- Abstract
BACKGROUND: Glycosylphosphatidylinositol biosynthesis defects (GPIBDs) cause a group of phenotypically overlapping recessive syndromes with intellectual disability, for which pathogenic mutations have been described in 16 genes of the corresponding molecular pathway. An elevated serum activity of alkaline phosphatase (AP), a GPI-linked enzyme, has been used to assign GPIBDs to the phenotypic series of hyperphosphatasia with mental retardation syndrome (HPMRS) and to distinguish them from another subset of GPIBDs, termed multiple congenital anomalies hypotonia seizures syndrome (MCAHS). However, the increasing number of individuals with a GPIBD shows that hyperphosphatasia is a variable feature that is not ideal for a clinical classification. METHODS: We studied the discriminatory power of multiple GPI-linked substrates that were assessed by flow cytometry in blood cells and fibroblasts of 39 and 14 individuals with a GPIBD, respectively. On the phenotypic level, we evaluated the frequency of occurrence of clinical symptoms and analyzed the performance of computer-assisted image analysis of the facial gestalt in 91 individuals. RESULTS: We found that certain malformations such as Morbus Hirschsprung and diaphragmatic defects are more likely to be associated with particular gene defects (PIGV, PGAP3, PIGN). However, especially at the severe end of the clinical spectrum of HPMRS, there is a high phenotypic overlap with MCAHS. Elevation of AP has also been documented in some of the individuals with MCAHS, namely those with PIGA mutations. Although the impairment of GPI-linked substrates is supposed to play the key role in the pathophysiology of GPIBDs, we could not observe gene-specific profiles for flow cytometric markers or a correlation between their cell surface levels and the severity of the phenotype. In contrast, it was facial recognition software that achieved the highest accuracy in predicting the disease-causing gene in a GPIBD. CONCLUSIONS: Due to the overlap
- Published
- 2018
5. Expanding the phenotype of a recurrentde novovariant inPACS1causing intellectual disability
- Author
-
Gadzicki, D., primary, Döcker, D., additional, Schubach, M., additional, Menzel, M., additional, Schmorl, B., additional, Stellmer, F., additional, Biskup, S., additional, and Bartholdi, D., additional
- Published
- 2014
- Full Text
- View/download PDF
6. Diagnostic Next-Generation Sequencing Panel for Hereditary Breast and Ovarian Cancer
- Author
-
Menzel, M.M., primary, Scheurenbrand, T., additional, Sprecher, A., additional, Schubach, M., additional, Battke, F., additional, and Biskup, S., additional
- Published
- 2013
- Full Text
- View/download PDF
7. Short clones or long clones? A simulation study on the use of paired reads in metagenomics
- Author
-
Schubach Max, Mitra Suparna, and Huson Daniel H
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Metagenomics is the study of environmental samples using sequencing. Rapid advances in sequencing technology are fueling a vast increase in the number and scope of metagenomics projects. Most metagenome sequencing projects so far have been based on Sanger or Roche-454 sequencing, as only these technologies provide long enough reads, while Illumina sequencing has not been considered suitable for metagenomic studies due to a short read length of only 35 bp. However, now that reads of length 75 bp can be sequenced in pairs, Illumina sequencing has become a viable option for metagenome studies. Results This paper addresses the problem of taxonomical analysis of paired reads. We describe a new feature of our metagenome analysis software MEGAN that allows one to process sequencing reads in pairs and makes assignments of such reads based on the combined bit scores of their matches to reference sequences. Using this new software in a simulation study, we investigate the use of Illumina paired-sequencing in taxonomical analysis and compare the performance of single reads, short clones and long clones. In addition, we also compare against simulated Roche-454 sequencing runs. Conclusion This work shows that paired reads perform better than single reads, as expected, but also, perhaps slightly less obviously, that long clones allow more specific assignments than short ones. A new version of the program MEGAN that explicitly takes paired reads into account is available from our website.
- Published
- 2010
- Full Text
- View/download PDF
8. Using individual barcodes to increase quantification power of massively parallel reporter assays.
- Author
-
Keukeleire P, Rosen JD, Göbel-Knapp A, Salomon K, Schubach M, and Kircher M
- Subjects
- Software, Humans, Genes, Reporter, High-Throughput Nucleotide Sequencing methods
- Abstract
Background: Massively parallel reporter assays (MPRAs) are an experimental technology for measuring the activity of thousands of candidate regulatory sequences or their variants in parallel, where the activity of individual sequences is measured from pools of sequence-tagged reporter genes. Activity is derived from the ratio of transcribed RNA to input DNA counts of associated tag sequences in each reporter construct, so-called barcodes. Recently, tools specifically designed to analyze MPRA data were developed that attempt to model the count data, accounting for its inherent variation. Of these tools, MPRAnalyze and mpralm are most widely used. MPRAnalyze models barcode counts to estimate the transcription rate of each sequence. While it has increased statistical power and robustness against outliers compared to mpralm, it is slow and has a high false discovery rate. Mpralm, a tool built on the R package Limma, estimates log fold-changes between different sequences. As opposed to MPRAnalyze, it is fast and has a low false discovery rate but is susceptible to outliers and has less statistical power., Results: We propose BCalm, an MPRA analysis framework aimed at addressing the limitations of the existing tools. BCalm is an adaptation of mpralm, but models individual barcode counts instead of aggregating counts per sequence. Leaving out the aggregation step increases statistical power and improves robustness to outliers, while being fast and precise. We show the improved performance over existing methods on both simulated MPRA data and a lentiviral MPRA library of 166,508 target sequences, including 82,258 allelic variants. Further, BCalm adds functionality beyond the existing mpralm package, such as preparing count input files from MPRAsnakeflow, as well as an option to test for sequences with enhancing or repressing activity. Its built-in plotting functionalities allow for easy interpretation of the results., Conclusions: With BCalm, we provide a new tool for analyzing MPRA data which is robust and accurate on real MPRA datasets. The package is available at https://github.com/kircherlab/BCalm ., Competing Interests: Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests., (© 2025. The Author(s).)
- Published
- 2025
- Full Text
- View/download PDF
9. Massively parallel characterization of transcriptional regulatory elements.
- Author
-
Agarwal V, Inoue F, Schubach M, Penzar D, Martin BK, Dash PM, Keukeleire P, Zhang Z, Sohota A, Zhao J, Georgakopoulos-Soares I, Noble WS, Yardımcı GG, Kulakovskiy IV, Kircher M, Shendure J, and Ahituv N
- Abstract
The human genome contains millions of candidate cis-regulatory elements (cCREs) with cell-type-specific activities that shape both health and many disease states
1 . However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these cCREs. Here we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of more than 680,000 sequences, representing an extensive set of annotated cCREs among three cell types (HepG2, K562 and WTC11), and found that 41.7% of these sequences were active. By testing sequences in both orientations, we find promoters to have strand-orientation biases and their 200-nucleotide cores to function as non-cell-type-specific 'on switches' that provide similar expression levels to their associated gene. By contrast, enhancers have weaker orientation biases, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict cCRE function and variant effects with high accuracy, delineate regulatory motifs and model their combinatorial effects. Testing a lentiMPRA library encompassing 60,000 cCREs in all three cell types further identified factors that determine cell-type specificity. Collectively, our work provides an extensive catalogue of functional CREs in three widely used cell lines and showcases how large-scale functional measurements can be used to dissect regulatory grammar., Competing Interests: Competing interests: V.A. is an employee of Sanofi, but performed this work independently of Sanofi. J.S. is a scientific advisory board member, consultant and/or co-founder of Cajal Neuroscience, Guardant Health, Maze Therapeutics, Camp4 Therapeutics, Phase Genomics, Adaptive Biotechnologies, Scale Biosciences, Somite Therapeutics, Sixth Street Capital and Pacific Biosciences. N.A. is a co-founder and on the scientific advisory board of Regel Therapeutics and receives funding from BioMarin Pharmaceutical Incorporated. All other authors declare no competing interests., (© 2025. The Author(s).)- Published
- 2025
- Full Text
- View/download PDF
10. CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions.
- Author
-
Schubach M, Maass T, Nazaretyan L, Röner S, and Kircher M
- Subjects
- Nucleotides, Humans, Machine Learning, Genetic Variation, Software, Genome, Human
- Abstract
Machine Learning-based scoring and classification of genetic variants aids the assessment of clinical findings and is employed to prioritize variants in diverse genetic studies and analyses. Combined Annotation-Dependent Depletion (CADD) is one of the first methods for the genome-wide prioritization of variants across different molecular functions and has been continuously developed and improved since its original publication. Here, we present our most recent release, CADD v1.7. We explored and integrated new annotation features, among them state-of-the-art protein language model scores (Meta ESM-1v), regulatory variant effect predictions (from sequence-based convolutional neural networks) and sequence conservation scores (Zoonomia). We evaluated the new version on data sets derived from ClinVar, ExAC/gnomAD and 1000 Genomes variants. For coding effects, we tested CADD on 31 Deep Mutational Scanning (DMS) data sets from ProteinGym and, for regulatory effect prediction, we used saturation mutagenesis reporter assay data of promoter and enhancer sequences. The inclusion of new features further improved the overall performance of CADD. As with previous releases, all data sets, genome-wide CADD v1.7 scores, scripts for on-site scoring and an easy-to-use webserver are readily provided via https://cadd.bihealth.org/ or https://cadd.gs.washington.edu/ to the community., (© The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2024
- Full Text
- View/download PDF
11. Massively parallel characterization of transcriptional regulatory elements in three diverse human cell types.
- Author
-
Agarwal V, Inoue F, Schubach M, Martin BK, Dash PM, Zhang Z, Sohota A, Noble WS, Yardimci GG, Kircher M, Shendure J, and Ahituv N
- Abstract
The human genome contains millions of candidate cis -regulatory elements (CREs) with cell-type-specific activities that shape both health and myriad disease states. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these CREs. Here, we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of over 680,000 sequences, representing a nearly comprehensive set of all annotated CREs among three cell types (HepG2, K562, and WTC11), finding 41.7% to be functional. By testing sequences in both orientations, we find promoters to have significant strand orientation effects. We also observe that their 200 nucleotide cores function as non-cell-type-specific 'on switches' providing similar expression levels to their associated gene. In contrast, enhancers have weaker orientation effects, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict CRE function with high accuracy and delineate regulatory motifs. Testing an additional lentiMPRA library encompassing 60,000 CREs in all three cell types, we further identified factors that determine cell-type specificity. Collectively, our work provides an exhaustive catalog of functional CREs in three widely used cell lines, and showcases how large-scale functional measurements can be used to dissect regulatory grammar., Competing Interests: Competing interests. V.A. is an employee of Sanofi Pasteur Inc. J.S. is a scientific advisory board member, consultant and/or co-founder of Cajal Neuroscience, Guardant Health, Maze Therapeutics, Camp4 Therapeutics, Phase Genomics, Adaptive Biotechnologies, Scale Biosciences, Sixth Street Capital, and Pacific Biosciences. N.A. is the cofounder and on the scientific advisory board of Regel Therapeutics and receives funding from BioMarin Pharmaceutical Incorporated. All other authors declare no competing interests.
- Published
- 2023
- Full Text
- View/download PDF
12. The Regulatory Mendelian Mutation score for GRCh38.
- Author
-
Schubach M, Nazaretyan L, and Kircher M
- Subjects
- Humans, Mutation, Databases, Genetic, Software, Genome, Human
- Abstract
Background: Genome sequencing efforts for individuals with rare Mendelian disease have increased the research focus on the noncoding genome and the clinical need for methods that prioritize potentially disease causal noncoding variants. Some tools for assessment of variant pathogenicity as well as annotations are not available for the current human genome build (GRCh38), for which the adoption in databases, software, and pipelines was slow., Results: Here, we present an updated version of the Regulatory Mendelian Mutation (ReMM) score, retrained on features and variants derived from the GRCh38 genome build. Like its GRCh37 version, it achieves good performance on its highly imbalanced data. To improve accessibility and provide users with a toolbox to score their variant files and look up scores in the genome, we developed a website and API for easy score lookup., Conclusions: Scores of the GRCh38 genome build are highly correlated to the prior release with a performance increase due to the better coverage of features. For prioritization of noncoding mutations in imbalanced datasets, the ReMM score performed much better than other variation scores. Prescored whole-genome files of GRCh37 and GRCh38 genome builds are cited in the article and the website; UCSC genome browser tracks, and an API are available at https://remm.bihealth.org., (© The Author(s) 2023. Published by Oxford University Press GigaScience.)
- Published
- 2022
- Full Text
- View/download PDF
13. Boosting tissue-specific prediction of active cis-regulatory regions through deep learning and Bayesian optimization techniques.
- Author
-
Cappelletti L, Petrini A, Gliozzo J, Casiraghi E, Schubach M, Kircher M, and Valentini G
- Subjects
- Humans, Bayes Theorem, Regulatory Sequences, Nucleic Acid, Neural Networks, Computer, Machine Learning, Deep Learning
- Abstract
Background: Cis-regulatory regions (CRRs) are non-coding regions of the DNA that fine control the spatio-temporal pattern of transcription; they are involved in a wide range of pivotal processes such as the development of specific cell-lines/tissues and the dynamic cell response to physiological stimuli. Recent studies showed that genetic variants occurring in CRRs are strongly correlated with pathogenicity or deleteriousness. Considering the central role of CRRs in the regulation of physiological and pathological conditions, the correct identification of CRRs and of their tissue-specific activity status through Machine Learning methods plays a major role in dissecting the impact of genetic variants on human diseases. Unfortunately, the problem is still open, though some promising results have been already reported by (deep) machine-learning based methods that predict active promoters and enhancers in specific tissues or cell lines by encoding epigenetic or spectral features directly extracted from DNA sequences., Results: We present the experiments we performed to compare two Deep Neural Networks, a Feed-Forward Neural Network model working on epigenomic features, and a Convolutional Neural Network model working only on genomic sequence, targeted to the identification of enhancer- and promoter-activity in specific cell lines. While performing experiments to understand how the experimental setup influences the prediction performance of the methods, we particularly focused on (1) automatic model selection performed by Bayesian optimization and (2) exploring different data rebalancing setups for reducing negative unbalancing effects., Conclusions: Results show that (1) automatic model selection by Bayesian optimization improves the quality of the learner; (2) data rebalancing considerably impacts the prediction performance of the models; test set rebalancing may provide over-optimistic results, and should therefore be cautiously applied; (3) despite working on sequence data, convolutional models obtain performance close to those of feed forward models working on epigenomic information, which suggests that also sequence data carries informative content for CRR-activity prediction. We therefore suggest combining both models/data types in future works., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
14. Genetic Diagnostics in Routine Osteological Assessment of Adult Low Bone Mass Disorders.
- Author
-
Oheim R, Tsourdi E, Seefried L, Beller G, Schubach M, Vettorazzi E, Stürznickel J, Rolvien T, Ehmke N, Delsmann A, Genest F, Krüger U, Zemojtel T, Barvencik F, Schinke T, Jakob F, Hofbauer LC, Mundlos S, and Kornak U
- Subjects
- Adult, Bone Density genetics, Female, Genotype, High-Throughput Nucleotide Sequencing, Humans, Male, Mutation, Osteogenesis Imperfecta diagnosis, Osteogenesis Imperfecta genetics, Osteoporosis diagnosis, Osteoporosis genetics, Spinal Fractures
- Abstract
Context: Many different inherited and acquired conditions can result in premature bone fragility/low bone mass disorders (LBMDs)., Objective: We aimed to elucidate the impact of genetic testing on differential diagnosis of adult LBMDs and at defining clinical criteria for predicting monogenic forms., Methods: Four clinical centers broadly recruited a cohort of 394 unrelated adult women before menopause and men younger than 55 years with a bone mineral density (BMD) Z-score < -2.0 and/or pathological fractures. After exclusion of secondary causes or unequivocal clinical/biochemical hallmarks of monogenic LBMDs, all participants were genotyped by targeted next-generation sequencing., Results: In total, 20.8% of the participants carried rare disease-causing variants (DCVs) in genes known to cause osteogenesis imperfecta (COL1A1, COL1A2), hypophosphatasia (ALPL), and early-onset osteoporosis (LRP5, PLS3, and WNT1). In addition, we identified rare DCVs in ENPP1, LMNA, NOTCH2, and ZNF469. Three individuals had autosomal recessive, 75 autosomal dominant, and 4 X-linked disorders. A total of 9.7% of the participants harbored variants of unknown significance. A regression analysis revealed that the likelihood of detecting a DCV correlated with a positive family history of osteoporosis, peripheral fractures (> 2), and a high normal body mass index (BMI). In contrast, mutation frequencies did not correlate with age, prevalent vertebral fractures, BMD, or biochemical parameters. In individuals without monogenic disease-causing rare variants, common variants predisposing for low BMD (eg, in LRP5) were overrepresented., Conclusion: The overlapping spectra of monogenic adult LBMD can be easily disentangled by genetic testing and the proposed clinical criteria can help to maximize the diagnostic yield., (© The Author(s) 2022. Published by Oxford University Press on behalf of the Endocrine Society.)
- Published
- 2022
- Full Text
- View/download PDF
15. Author Correction: lentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements.
- Author
-
Gordon MG, Inoue F, Martin B, Schubach M, Agarwal V, Whalen S, Feng S, Zhao J, Ashuach T, Ziffra R, Kreimer A, Georgakopoulos-Soares I, Yosef N, Ye CJ, Pollard KS, Shendure J, Kircher M, and Ahituv N
- Published
- 2021
- Full Text
- View/download PDF
16. CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores.
- Author
-
Rentzsch P, Schubach M, Shendure J, and Kircher M
- Subjects
- Base Sequence, Exons genetics, Humans, Introns genetics, Deep Learning, Genetic Variation, Genome-Wide Association Study, RNA Splicing genetics
- Abstract
Background: Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies., Methods: It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants., Results: We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu ), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance., Conclusions: While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction.
- Published
- 2021
- Full Text
- View/download PDF
17. The impact of different negative training data on regulatory sequence predictions.
- Author
-
Krützfeldt LM, Schubach M, and Kircher M
- Subjects
- A549 Cells, Cell Line, Tumor, Chromatin genetics, DNA genetics, Genome genetics, Genomics methods, HeLa Cells, Hep G2 Cells, Humans, K562 Cells, MCF-7 Cells, Neural Networks, Computer, Promoter Regions, Genetic genetics, Sequence Analysis, DNA, Support Vector Machine, Regulatory Sequences, Nucleic Acid genetics
- Abstract
Regulatory regions, like promoters and enhancers, cover an estimated 5-15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences. Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements' relative activity as measured from independent experimental data. Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2020
- Full Text
- View/download PDF
18. lentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements.
- Author
-
Gordon MG, Inoue F, Martin B, Schubach M, Agarwal V, Whalen S, Feng S, Zhao J, Ashuach T, Ziffra R, Kreimer A, Georgakopoulos-Soares I, Yosef N, Ye CJ, Pollard KS, Shendure J, Kircher M, and Ahituv N
- Subjects
- Base Sequence, Lentivirus genetics, Regulatory Sequences, Nucleic Acid genetics, Sequence Analysis, DNA methods, Workflow
- Abstract
Massively parallel reporter assays (MPRAs) can simultaneously measure the function of thousands of candidate regulatory sequences (CRSs) in a quantitative manner. In this method, CRSs are cloned upstream of a minimal promoter and reporter gene, alongside a unique barcode, and introduced into cells. If the CRS is a functional regulatory element, it will lead to the transcription of the barcode sequence, which is measured via RNA sequencing and normalized for cellular integration via DNA sequencing of the barcode. This technology has been used to test thousands of sequences and their variants for regulatory activity, to decipher the regulatory code and its evolution, and to develop genetic switches. Lentivirus-based MPRA (lentiMPRA) produces 'in-genome' readouts and enables the use of this technique in hard-to-transfect cells. Here, we provide a detailed protocol for lentiMPRA, along with a user-friendly Nextflow-based computational pipeline-MPRAflow-for quantifying CRS activity from different MPRA designs. The lentiMPRA protocol takes ~2 months, which includes sequencing turnaround time and data processing with MPRAflow.
- Published
- 2020
- Full Text
- View/download PDF
19. parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.
- Author
-
Petrini A, Mesiti M, Schubach M, Frasca M, Danis D, Re M, Grossi G, Cappelletti L, Castrignanò T, Robinson PN, and Valentini G
- Subjects
- Algorithms, Databases, Genetic, Genomics methods, Humans, Machine Learning, Reproducibility of Results, Computational Biology methods, Genetic Predisposition to Disease, Genetic Variation, Genome-Wide Association Study methods, Software
- Abstract
Background: Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data., Results: To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version., Conclusions: parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF., (© The Author(s) 2020. Published by Oxford University Press.)
- Published
- 2020
- Full Text
- View/download PDF
20. PEDIA: prioritization of exome data by image analysis.
- Author
-
Hsieh TC, Mensah MA, Pantel JT, Aguilar D, Bar O, Bayat A, Becerra-Solano L, Bentzen HB, Biskup S, Borisov O, Braaten O, Ciaccio C, Coutelier M, Cremer K, Danyel M, Daschkey S, Eden HD, Devriendt K, Wilson S, Douzgou S, Đukić D, Ehmke N, Fauth C, Fischer-Zirnsak B, Fleischer N, Gabriel H, Graul-Neumann L, Gripp KW, Gurovich Y, Gusina A, Haddad N, Hajjir N, Hanani Y, Hertzberg J, Hoertnagel K, Howell J, Ivanovski I, Kaindl A, Kamphans T, Kamphausen S, Karimov C, Kathom H, Keryan A, Knaus A, Köhler S, Kornak U, Lavrov A, Leitheiser M, Lyon GJ, Mangold E, Reina PM, Carrascal AM, Mitter D, Herrador LM, Nadav G, Nöthen M, Orrico A, Ott CE, Park K, Peterlin B, Pölsler L, Raas-Rothschild A, Randolph L, Revencu N, Fagerberg CR, Robinson PN, Rosnev S, Rudnik S, Rudolf G, Schatz U, Schossig A, Schubach M, Shanoon O, Sheridan E, Smirin-Yosef P, Spielmann M, Suk EK, Sznajer Y, Thiel CT, Thiel G, Verloes A, Vrecar I, Wahl D, Weber I, Winter K, Wiśniewska M, Wollnik B, Yeung MW, Zhao M, Zhu N, Zschocke J, Mundlos S, Horn D, and Krawitz PM
- Subjects
- Algorithms, Databases, Genetic, Deep Learning, Exome genetics, Female, Genomics, Humans, Male, Phenotype, Software, Computational Biology methods, Image Processing, Computer-Assisted methods, Sequence Analysis, DNA methods
- Abstract
Purpose: Phenotype information is crucial for the interpretation of genomic variants. So far it has only been accessible for bioinformatics workflows after encoding into clinical terms by expert dysmorphologists., Methods: Here, we introduce an approach driven by artificial intelligence that uses portrait photographs for the interpretation of clinical exome data. We measured the value added by computer-assisted image analysis to the diagnostic yield on a cohort consisting of 679 individuals with 105 different monogenic disorders. For each case in the cohort we compiled frontal photos, clinical features, and the disease-causing variants, and simulated multiple exomes of different ethnic backgrounds., Results: The additional use of similarity scores from computer-assisted analysis of frontal photos improved the top 1 accuracy rate by more than 20-89% and the top 10 accuracy rate by more than 5-99% for the disease-causing gene., Conclusion: Image analysis by deep-learning algorithms can be used to quantify the phenotypic similarity (PP4 criterion of the American College of Medical Genetics and Genomics guidelines) and to advance the performance of bioinformatics pipelines for exome analysis.
- Published
- 2019
- Full Text
- View/download PDF
21. Haploinsufficiency of the Notch Ligand DLL1 Causes Variable Neurodevelopmental Disorders.
- Author
-
Fischer-Zirnsak B, Segebrecht L, Schubach M, Charles P, Alderman E, Brown K, Cadieux-Dion M, Cartwright T, Chen Y, Costin C, Fehr S, Fitzgerald KM, Fleming E, Foss K, Ha T, Hildebrand G, Horn D, Liu S, Marco EJ, McDonald M, McWalter K, Race S, Rush ET, Si Y, Saunders C, Slavotinek A, Stockler-Ipsiroglu S, Telegrafi A, Thiffault I, Torti E, Tsai AC, Wang X, Zafar M, Keren B, Kornak U, Boerkoel CF, Mirzaa G, and Ehmke N
- Subjects
- Cohort Studies, Female, Humans, Ligands, Male, Pedigree, Exome Sequencing, Calcium-Binding Proteins genetics, Haploinsufficiency, Membrane Proteins genetics, Neurodevelopmental Disorders genetics
- Abstract
Notch signaling is an established developmental pathway for brain morphogenesis. Given that Delta-like 1 (DLL1) is a ligand for the Notch receptor and that a few individuals with developmental delay, intellectual disability, and brain malformations have microdeletions encompassing DLL1, we hypothesized that insufficiency of DLL1 causes a human neurodevelopmental disorder. We performed exome sequencing in individuals with neurodevelopmental disorders. The cohort was identified using known Matchmaker Exchange nodes such as GeneMatcher. This method identified 15 individuals from 12 unrelated families with heterozygous pathogenic DLL1 variants (nonsense, missense, splice site, and one whole gene deletion). The most common features in our cohort were intellectual disability, autism spectrum disorder, seizures, variable brain malformations, muscular hypotonia, and scoliosis. We did not identify an obvious genotype-phenotype correlation. Analysis of one splice site variant showed an in-frame insertion of 12 bp. In conclusion, heterozygous DLL1 pathogenic variants cause a variable neurodevelopmental phenotype and multi-systemic features. The clinical and molecular data support haploinsufficiency as a mechanism for the pathogenesis of this DLL1-related disorder and affirm the importance of DLL1 in human brain development., (Copyright © 2019 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
22. Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay.
- Author
-
Shigaki D, Adato O, Adhikari AN, Dong S, Hawkins-Hooker A, Inoue F, Juven-Gershon T, Kenlay H, Martin B, Patra A, Penzar DD, Schubach M, Xiong C, Yan Z, Boyle AP, Kreimer A, Kulakovskiy IV, Reid J, Unger R, Yosef N, Shendure J, Ahituv N, Kircher M, and Beer MA
- Subjects
- Binding Sites, Cell Line, Chromatin genetics, DNA metabolism, Enhancer Elements, Genetic, Genetic Predisposition to Disease, Humans, Machine Learning, Promoter Regions, Genetic, Transcription Factors metabolism, DNA chemistry, Epigenomics methods, Point Mutation
- Abstract
The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation., (© 2019 Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
23. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution.
- Author
-
Kircher M, Xiong C, Martin B, Schubach M, Inoue F, Bell RJA, Costello JF, Shendure J, and Ahituv N
- Subjects
- Cell Line, Cloning, Molecular, Genome, Human genetics, Genomic Library, High-Throughput Nucleotide Sequencing, Humans, Polymorphism, Single Nucleotide, Computational Biology methods, Disease genetics, Mutagenesis, Regulatory Elements, Transcriptional genetics
- Abstract
The majority of common variants associated with common diseases, as well as an unknown proportion of causal mutations for rare diseases, fall in noncoding regions of the genome. Although catalogs of noncoding regulatory elements are steadily improving, we have a limited understanding of the functional effects of mutations within them. Here, we perform saturation mutagenesis in conjunction with massively parallel reporter assays on 20 disease-associated gene promoters and enhancers, generating functional measurements for over 30,000 single nucleotide substitutions and deletions. We find that the density of putative transcription factor binding sites varies widely between regulatory elements, as does the extent to which evolutionary conservation or integrative scores predict functional effects. These data provide a powerful resource for interpreting the pathogenicity of clinically observed mutations in these disease-associated regulatory elements, and comprise a rich dataset for the further development of algorithms that aim to predict the regulatory effects of noncoding mutations.
- Published
- 2019
- Full Text
- View/download PDF
24. Multisite de novo mutations in human offspring after paternal exposure to ionizing radiation.
- Author
-
Holtgrewe M, Knaus A, Hildebrand G, Pantel JT, Santos MRL, Neveling K, Goldmann J, Schubach M, Jäger M, Coutelier M, Mundlos S, Beule D, Sperling K, and Krawitz PM
- Subjects
- Adult, Animals, Base Sequence, Cohort Studies, Computational Biology methods, Female, Humans, Infant, Newborn, Male, Mice, Military Personnel, Mutation Rate, Pilot Projects, Risk Factors, Whole Genome Sequencing, Genome, Human, Germ-Line Mutation, Paternal Exposure, Radiation, Ionizing
- Abstract
A genome-wide evaluation of the effects of ionizing radiation on mutation induction in the mouse germline has identified multisite de novo mutations (MSDNs) as marker for previous exposure. Here we present the results of a small pilot study of whole genome sequencing in offspring of soldiers who served in radar units on weapon systems that were emitting high-frequency radiation. We found cases of exceptionally high MSDN rates as well as an increased mean in our cohort: While a MSDN mutation is detected in average in 1 out of 5 offspring of unexposed controls, we observed 12 MSDNs in altogether 18 offspring, including a family with 6 MSDNs in 3 offspring. Moreover, we found two translocations, also resulting from neighboring mutations. Our findings indicate that MSDNs might be suited in principle for the assessment of DNA damage from ionizing radiation also in humans. However, as exact person-related dose values in risk groups are usually not available, the interpretation of MSDNs in single families would benefit from larger molecular epidemiologic studies on this new biomarker.
- Published
- 2018
- Full Text
- View/download PDF
25. Immune monitoring and TCR sequencing of CD4 T cells in a long term responsive patient with metastasized pancreatic ductal carcinoma treated with individualized, neoepitope-derived multipeptide vaccines: a case report.
- Author
-
Sonntag K, Hashimoto H, Eyrich M, Menzel M, Schubach M, Döcker D, Battke F, Courage C, Lambertz H, Handgretinger R, Biskup S, and Schilbach K
- Subjects
- Amino Acid Sequence, Carcinoma, Pancreatic Ductal blood, Carcinoma, Pancreatic Ductal immunology, Carcinoma, Pancreatic Ductal secondary, Humans, Male, Middle Aged, Pancreatic Neoplasms blood, Pancreatic Neoplasms immunology, Pancreatic Neoplasms secondary, Peptides chemistry, Peptides immunology, Treatment Outcome, Vaccination, CD4-Positive T-Lymphocytes immunology, Cancer Vaccines immunology, Carcinoma, Pancreatic Ductal therapy, Epitopes immunology, Monitoring, Immunologic, Pancreatic Neoplasms therapy, Receptors, Antigen, T-Cell, alpha-beta genetics, Vaccines, Subunit immunology
- Abstract
Background: Cancer vaccines can effectively establish clinically relevant tumor immunity. Novel sequencing approaches rapidly identify the mutational fingerprint of tumors, thus allowing to generate personalized tumor vaccines within a few weeks from diagnosis. Here, we report the case of a 62-year-old patient receiving a four-peptide-vaccine targeting the two sole mutations of his pancreatic tumor, identified via exome sequencing., Methods: Vaccination started during chemotherapy in second complete remission and continued monthly thereafter. We tracked IFN-γ
+ T cell responses against vaccine peptides in peripheral blood after 12, 17 and 34 vaccinations by analyzing T-cell receptor (TCR) repertoire diversity and epitope-binding regions of peptide-reactive T-cell lines and clones. By restricting analysis to sorted IFN-γ-producing T cells we could assure epitope-specificity, functionality, and TH 1 polarization., Results: A peptide-specific T-cell response against three of the four vaccine peptides could be detected sequentially. Molecular TCR analysis revealed a broad vaccine-reactive TCR repertoire with clones of discernible specificity. Four identical or convergent TCR sequences could be identified at more than one time-point, indicating timely persistence of vaccine-reactive T cells. One dominant TCR expressing a dual TCRVα chain could be found in three T-cell clones. The observed T-cell responses possibly contributed to clinical outcome: The patient is alive 6 years after initial diagnosis and in complete remission for 4 years now., Conclusions: Therapeutic vaccination with a neoantigen-derived four-peptide vaccine resulted in a diverse and long-lasting immune response against these targets which was associated with prolonged clinical remission. These data warrant confirmation in a larger proof-of concept clinical trial.- Published
- 2018
- Full Text
- View/download PDF
26. Characterization of glycosylphosphatidylinositol biosynthesis defects by clinical features, flow cytometry, and automated image analysis.
- Author
-
Knaus A, Pantel JT, Pendziwiat M, Hajjir N, Zhao M, Hsieh TC, Schubach M, Gurovich Y, Fleischer N, Jäger M, Köhler S, Muhle H, Korff C, Møller RS, Bayat A, Calvas P, Chassaing N, Warren H, Skinner S, Louie R, Evers C, Bohn M, Christen HJ, van den Born M, Obersztyn E, Charzewska A, Endziniene M, Kortüm F, Brown N, Robinson PN, Schelhaas HJ, Weber Y, Helbig I, Mundlos S, Horn D, and Krawitz PM
- Subjects
- Abnormalities, Multiple metabolism, Automation, Biomarkers metabolism, Humans, Intellectual Disability metabolism, Phenotype, Phosphorus Metabolism Disorders metabolism, Syndrome, Flow Cytometry methods, Glycosylphosphatidylinositols biosynthesis, Image Processing, Computer-Assisted
- Abstract
Background: Glycosylphosphatidylinositol biosynthesis defects (GPIBDs) cause a group of phenotypically overlapping recessive syndromes with intellectual disability, for which pathogenic mutations have been described in 16 genes of the corresponding molecular pathway. An elevated serum activity of alkaline phosphatase (AP), a GPI-linked enzyme, has been used to assign GPIBDs to the phenotypic series of hyperphosphatasia with mental retardation syndrome (HPMRS) and to distinguish them from another subset of GPIBDs, termed multiple congenital anomalies hypotonia seizures syndrome (MCAHS). However, the increasing number of individuals with a GPIBD shows that hyperphosphatasia is a variable feature that is not ideal for a clinical classification., Methods: We studied the discriminatory power of multiple GPI-linked substrates that were assessed by flow cytometry in blood cells and fibroblasts of 39 and 14 individuals with a GPIBD, respectively. On the phenotypic level, we evaluated the frequency of occurrence of clinical symptoms and analyzed the performance of computer-assisted image analysis of the facial gestalt in 91 individuals., Results: We found that certain malformations such as Morbus Hirschsprung and diaphragmatic defects are more likely to be associated with particular gene defects (PIGV, PGAP3, PIGN). However, especially at the severe end of the clinical spectrum of HPMRS, there is a high phenotypic overlap with MCAHS. Elevation of AP has also been documented in some of the individuals with MCAHS, namely those with PIGA mutations. Although the impairment of GPI-linked substrates is supposed to play the key role in the pathophysiology of GPIBDs, we could not observe gene-specific profiles for flow cytometric markers or a correlation between their cell surface levels and the severity of the phenotype. In contrast, it was facial recognition software that achieved the highest accuracy in predicting the disease-causing gene in a GPIBD., Conclusions: Due to the overlapping clinical spectrum of both HPMRS and MCAHS in the majority of affected individuals, the elevation of AP and the reduced surface levels of GPI-linked markers in both groups, a common classification as GPIBDs is recommended. The effectiveness of computer-assisted gestalt analysis for the correct gene inference in a GPIBD and probably beyond is remarkable and illustrates how the information contained in human faces is pivotal in the delineation of genetic entities.
- Published
- 2018
- Full Text
- View/download PDF
27. Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods.
- Author
-
Notaro M, Schubach M, Robinson PN, and Valentini G
- Subjects
- Area Under Curve, Genetic Association Studies, Humans, Molecular Sequence Annotation, Phenotype, ROC Curve, Algorithms, Biological Ontologies
- Abstract
Background: The prediction of human gene-abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene-disease associations has been widely investigated, the related problem of gene-phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions., Results: We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a "flat" learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of-the-art algorithms and with a significant reduction of the computational complexity., Conclusions: Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository.
- Published
- 2017
- Full Text
- View/download PDF
28. Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants.
- Author
-
Schubach M, Re M, Robinson PN, and Valentini G
- Subjects
- Algorithms, Genome-Wide Association Study, Humans, Models, Genetic, Mutation, Reproducibility of Results, Software, Genetic Predisposition to Disease, Genetic Variation, Machine Learning, RNA, Untranslated
- Abstract
Disease and trait-associated variants represent a tiny minority of all known genetic variation, and therefore there is necessarily an imbalance between the small set of available disease-associated and the much larger set of non-deleterious genomic variation, especially in non-coding regulatory regions of human genome. Machine Learning (ML) methods for predicting disease-associated non-coding variants are faced with a chicken and egg problem - such variants cannot be easily found without ML, but ML cannot begin to be effective until a sufficient number of instances have been found. Most of state-of-the-art ML-based methods do not adopt specific imbalance-aware learning techniques to deal with imbalanced data that naturally arise in several genome-wide variant scoring problems, thus resulting in a significant reduction of sensitivity and precision. We present a novel method that adopts imbalance-aware learning strategies based on resampling techniques and a hyper-ensemble approach that outperforms state-of-the-art methods in two different contexts: the prediction of non-coding variants associated with Mendelian and with complex diseases. We show that imbalance-aware ML is a key issue for the design of robust and accurate prediction algorithms and we provide a method and an easy-to-use software tool that can be effectively applied to this challenging prediction task.
- Published
- 2017
- Full Text
- View/download PDF
29. Alternate-locus aware variant calling in whole genome sequencing.
- Author
-
Jäger M, Schubach M, Zemojtel T, Reinert K, Church DM, and Robinson PN
- Subjects
- Humans, Algorithms, Genetic Variation, Genome, Human, Heterozygote, Sequence Alignment methods, Sequence Analysis, DNA methods
- Abstract
Background: The last two human genome assemblies have extended the previous linear golden-path paradigm of the human genome to a graph-like model to better represent regions with a high degree of structural variability. The new model offers opportunities to improve the technical validity of variant calling in whole-genome sequencing (WGS)., Methods: We developed an algorithm that analyzes the patterns of variant calls in the 178 structurally variable regions of the GRCh38 genome assembly, and infers whether a given sample is most likely to contain sequences from the primary assembly, an alternate locus, or their heterozygous combination at each of these 178 regions. We investigate 121 in-house WGS datasets that have been aligned to the GRCh37 and GRCh38 assemblies., Results: We show that stretches of sequences that are largely but not entirely identical between the primary assembly and an alternate locus can result in multiple variant calls against regions of the primary assembly. In WGS analysis, this results in characteristic and recognizable patterns of variant calls at positions that we term alignable scaffold-discrepant positions (ASDPs). In 121 in-house genomes, on average 51.8±3.8 of the 178 regions were found to correspond best to an alternate locus rather than the primary assembly sequence, and filtering these genomes with our algorithm led to the identification of 7863 variant calls per genome that colocalized with ASDPs. Additionally, we found that 437 of 791 genome-wide association study hits located within one of the regions corresponded to ASDPs., Conclusions: Our algorithm uses the information contained in the 178 structurally variable regions of the GRCh38 genome assembly to avoid spurious variant calls in cases where samples contain an alternate locus rather than the corresponding segment of the primary assembly. These results suggest the great potential of fully incorporating the resources of graph-like genome assemblies into variant calling, but also underscore the importance of developing computational resources that will allow a full reconstruction of the genotype in personal genomes. Our algorithm is freely available at https://github.com/charite/asdpex .
- Published
- 2016
- Full Text
- View/download PDF
30. A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease.
- Author
-
Smedley D, Schubach M, Jacobsen JOB, Köhler S, Zemojtel T, Spielmann M, Jäger M, Hochheiser H, Washington NL, McMurry JA, Haendel MA, Mungall CJ, Lewis SE, Groza T, Valentini G, and Robinson PN
- Subjects
- Gene Frequency, Genome-Wide Association Study, Humans, Machine Learning, Open Reading Frames genetics, Phenotype, Point Mutation genetics, Algorithms, Genetic Diseases, Inborn genetics, Genome, Human genetics, Mutation genetics
- Abstract
The interpretation of non-coding variants still constitutes a major challenge in the application of whole-genome sequencing in Mendelian disease, especially for single-nucleotide and other small non-coding variants. Here we present Genomiser, an analysis framework that is able not only to score the relevance of variation in the non-coding genome, but also to associate regulatory variants to specific Mendelian diseases. Genomiser scores variants through either existing methods such as CADD or a bespoke machine learning method and combines these with allele frequency, regulatory sequences, chromosomal topological domains, and phenotypic relevance to discover variants associated to specific Mendelian disorders. Overall, Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes, allowing effective detection and discovery of regulatory variants in Mendelian disease., (Copyright © 2016 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.)
- Published
- 2016
- Full Text
- View/download PDF
31. Strømme Syndrome Is a Ciliary Disorder Caused by Mutations in CENPF.
- Author
-
Filges I, Bruder E, Brandal K, Meier S, Undlien DE, Waage TR, Hoesli I, Schubach M, de Beer T, Sheng Y, Hoeller S, Schulzke S, Røsby O, Miny P, Tercanli S, Oppedal T, Meyer P, Selmer KK, and Strømme P
- Published
- 2016
- Full Text
- View/download PDF
32. Loss-of-function variants in HIVEP2 are a cause of intellectual disability.
- Author
-
Srivastava S, Engels H, Schanze I, Cremer K, Wieland T, Menzel M, Schubach M, Biskup S, Kreiß M, Endele S, Strom TM, Wieczorek D, Zenker M, Gupta S, Cohen J, Zink AM, and Naidu S
- Subjects
- Child, Preschool, Exome, Female, Humans, Infant, Intellectual Disability diagnosis, Male, Young Adult, Codon, Nonsense, DNA-Binding Proteins genetics, Intellectual Disability genetics, Transcription Factors genetics
- Abstract
Intellectual disability (ID) affects 2-3% of the population. In the past, many genetic causes of ID remained unidentified due to its vast heterogeneity. Recently, whole exome sequencing (WES) studies have shown that de novo variants underlie a significant portion of sporadic cases of ID. Applying WES to patients with ID or global developmental delay at different centers, we identified three individuals with distinct de novo variants in HIVEP2 (human immunodeficiency virus type I enhancer binding protein), which belongs to a family of zinc-finger-containing transcriptional proteins involved in growth and development. Two of the variants were nonsense changes, and one was a 1 bp deletion resulting in a premature stop codon that was reported previously without clinical detail. In silico prediction programs suggest loss-of-function in the mutated allele leading to haploinsufficiency as a putative mechanism in all three individuals. All three patients presented with moderate-to-severe ID, minimal structural brain anomalies, hypotonia, and mild dysmorphic features. Growth parameters were in the normal range except for borderline microcephaly at birth in one patient. Two of the patients exhibited behavioral anomalies including hyperactivity and aggression. Published functional data suggest a neurodevelopmental role for HIVEP2, and several of the genes regulated by HIVEP2 are implicated in brain development, for example, SSTR-2, c-Myc, and genes of the NF-κB pathway. In addition, HIVEP2-knockout mice exhibit several working memory deficits, increased anxiety, and hyperactivity. On the basis of the genotype-phenotype correlation and existing functional data, we propose HIVEP2 as a causative ID gene.
- Published
- 2016
- Full Text
- View/download PDF
33. Mutation Detection in Patients with Retinal Dystrophies Using Targeted Next Generation Sequencing.
- Author
-
Weisschuh N, Mayer AK, Strom TM, Kohl S, Glöckle N, Schubach M, Andreasson S, Bernd A, Birch DG, Hamel CP, Heckenlively JR, Jacobson SG, Kamme C, Kellner U, Kunstmann E, Maffei P, Reiff CM, Rohrschneider K, Rosenberg T, Rudolph G, Vámos R, Varsányi B, Weleber RG, and Wissinger B
- Subjects
- DNA Copy Number Variations, Exome, Eye Proteins genetics, Female, Genetic Association Studies, Genetic Heterogeneity, Genetic Predisposition to Disease, High-Throughput Nucleotide Sequencing, Humans, Male, Mutation Rate, Pedigree, Phenotype, Retinal Dystrophies diagnosis, Mutation, Retinal Dystrophies genetics
- Abstract
Retinal dystrophies (RD) constitute a group of blinding diseases that are characterized by clinical variability and pronounced genetic heterogeneity. The different nonsyndromic and syndromic forms of RD can be attributed to mutations in more than 200 genes. Consequently, next generation sequencing (NGS) technologies are among the most promising approaches to identify mutations in RD. We screened a large cohort of patients comprising 89 independent cases and families with various subforms of RD applying different NGS platforms. While mutation screening in 50 cases was performed using a RD gene capture panel, 47 cases were analyzed using whole exome sequencing. One family was analyzed using whole genome sequencing. A detection rate of 61% was achieved including mutations in 34 known and two novel RD genes. A total of 69 distinct mutations were identified, including 39 novel mutations. Notably, genetic findings in several families were not consistent with the initial clinical diagnosis. Clinical reassessment resulted in refinement of the clinical diagnosis in some of these families and confirmed the broad clinical spectrum associated with mutations in RD genes.
- Published
- 2016
- Full Text
- View/download PDF
34. Next-generation diagnostics and disease-gene discovery with the Exomiser.
- Author
-
Smedley D, Jacobsen JO, Jäger M, Köhler S, Holtgrewe M, Schubach M, Siragusa E, Zemojtel T, Buske OJ, Washington NL, Bone WP, Haendel MA, and Robinson PN
- Subjects
- Genetic Testing methods, Humans, Sequence Analysis, DNA methods, Software, Exome, High-Throughput Nucleotide Sequencing methods
- Abstract
Exomiser is an application that prioritizes genes and variants in next-generation sequencing (NGS) projects for novel disease-gene discovery or differential diagnostics of Mendelian disease. Exomiser comprises a suite of algorithms for prioritizing exome sequences using random-walk analysis of protein interaction networks, clinical relevance and cross-species phenotype comparisons, as well as a wide range of other computational filters for variant frequency, predicted pathogenicity and pedigree analysis. In this protocol, we provide a detailed explanation of how to install Exomiser and use it to prioritize exome sequences in a number of scenarios. Exomiser requires ∼3 GB of RAM and roughly 15-90 s of computing time on a standard desktop computer to analyze a variant call format (VCF) file. Exomiser is freely available for academic use from http://www.sanger.ac.uk/science/tools/exomiser.
- Published
- 2015
- Full Text
- View/download PDF
35. Whole exome sequencing of microdissected splenic marginal zone lymphoma: a study to discover novel tumor-specific mutations.
- Author
-
Peveling-Oberhag J, Wolters F, Döring C, Walter D, Sellmann L, Scholtysik R, Lucioni M, Schubach M, Paulli M, Biskup S, Zeuzem S, Küppers R, and Hansmann ML
- Subjects
- Biomarkers, Tumor genetics, DNA Mutational Analysis, Female, Humans, Male, Microarray Analysis, Middle Aged, Polymorphism, Single Nucleotide, Transcription Factors genetics, Exome genetics, Lymphoma, B-Cell, Marginal Zone genetics, Mutation, Neoplasm Proteins genetics, Splenic Neoplasms genetics
- Abstract
Background: Splenic marginal zone lymphoma (SMZL) is an indolent B-cell non-Hodgkin lymphoma and represents the most common primary malignancy of the spleen. Its precise molecular pathogenesis is still unknown and specific molecular markers for diagnosis or possible targets for causal therapies are lacking., Methods: We performed whole exome sequencing (WES) and copy number analysis from laser-microdissected tumor cells of two primary SMZL discovery cases. Selected somatic single nucleotide variants (SNVs) were analyzed using pyrosequencing and Sanger sequencing in an independent validation cohort., Results: Overall, 25 nonsynonymous somatic SNVs were identified, including known mutations in the NOTCH2 and MYD88 genes. Twenty-three of the mutations have not been associated with SMZL before. Many of these seem to be subclonal. Screening of 24 additional SMZL for mutations at the same positions found mutated in the WES approach revealed no recurrence of mutations for ZNF608 and PDE10A, whereas the MYD88 L265P missense mutation was identified in 15% of cases. An analysis of the NOTCH2 PEST domain and the whole coding region of the transcription factor SMYD1 in eight cases identified no additional case with a NOTCH2 mutation, but two additional cases with SMYD1 alterations., Conclusions: In this first WES approach from microdissected SMZL tissue we confirmed known mutations and discovered new somatic variants. Recurrence of MYD88 mutations in SMZL was validated, but NOTCH2 PEST domain mutations were relatively rare (10 % of cases). Recurrent mutations in the transcription factor SMYD1 have not been described in SMZL before and warrant further investigation.
- Published
- 2015
- Full Text
- View/download PDF
36. From ventriculomegaly to severe muscular atrophy: expansion of the clinical spectrum related to mutations in AIFM1.
- Author
-
Kettwig M, Schubach M, Zimmermann FA, Klinge L, Mayr JA, Biskup S, Sperl W, Gärtner J, and Huppke P
- Subjects
- Adolescent, Adult, Ataxia genetics, Ataxia pathology, Child, Child, Preschool, Family Health, Humans, Infant, Infant, Newborn, Male, Muscular Diseases genetics, Muscular Diseases pathology, Mutant Proteins genetics, Mutant Proteins metabolism, Mutation, Missense, Neurodegenerative Diseases genetics, Neurodegenerative Diseases pathology, Young Adult, Apoptosis Inducing Factor genetics, Apoptosis Inducing Factor metabolism, Genetic Diseases, Inborn genetics, Genetic Diseases, Inborn pathology, Mitochondrial Diseases genetics, Mitochondrial Diseases pathology
- Abstract
The apoptosis-inducing factor (AIF) functions as a FAD-dependent NADH oxidase in mitochondria. Upon apoptotic stimulation it is released from mitochondria and migrates to the nucleus where it induces chromatin condensation and DNA fragmentation. So far mutations in AIFM1, a X-chromosomal gene coding for AIF, have been described in three families with 11 affected males. We report here on a further patient thereby expanding the clinical and mutation spectrum. In addition, we review the known phenotypes related to AIFM1 mutations. The clinical course in the male patient described here was characterized by phases with rapid deterioration and long phases without obvious progression of disease. At age 2.5 years he developed hearing loss and severe ataxia and at age 10 years muscle wasting, swallowing difficulties, respiratory insufficiency and external opthamoplegia. By next generation sequencing of whole exome we identified a hemizygous missense mutation in the AIFM1 gene, c.727G>T (p.Val243Leu) affecting a highly conserved residue in the FAD-binding domain. Summarizing what is known today, mutations in AIFM1 are associated with a progressive disorder with myopathy, ataxia and neuropathy. Severity varies greatly even within one family with onset of symptoms between birth and adolescence. 3 of 12 patients died before age 5 years while others were still able to walk during young adulthood. Less frequent symptoms were hearing loss, seizures and psychomotor regression. Results from clinical chemistry, brain imaging and muscle biopsy were unspecific and inconsistent., (Copyright © 2015. Published by Elsevier B.V.)
- Published
- 2015
- Full Text
- View/download PDF
37. Germline PTPN11 and somatic PIK3CA variant in a boy with megalencephaly-capillary malformation syndrome (MCAP)--pure coincidence?
- Author
-
Döcker D, Schubach M, Menzel M, Spaich C, Gabriel HD, Zenker M, Bartholdi D, and Biskup S
- Subjects
- Child, Class I Phosphatidylinositol 3-Kinases, Comparative Genomic Hybridization, Consanguinity, Exome, High-Throughput Nucleotide Sequencing, Humans, Male, Models, Biological, Pedigree, Phenotype, Telangiectasis diagnosis, Telangiectasis genetics, Abnormalities, Multiple diagnosis, Abnormalities, Multiple genetics, Genetic Variation, Germ-Line Mutation, Megalencephaly diagnosis, Megalencephaly genetics, Phosphatidylinositol 3-Kinases genetics, Protein Tyrosine Phosphatase, Non-Receptor Type 11 genetics, Skin Diseases, Vascular diagnosis, Skin Diseases, Vascular genetics, Telangiectasis congenital
- Abstract
Megalencephaly-capillary malformation (MCAP) syndrome is an overgrowth syndrome that is diagnosed by clinical criteria. Recently, somatic and germline variants in genes that are involved in the PI3K-AKT pathway (AKT3, PIK3R2 and PIK3CA) have been described to be associated with MCAP and/or other related megalencephaly syndromes. We performed trio-exome sequencing in a 6-year-old boy and his healthy parents. Clinical features were macrocephaly, cutis marmorata, angiomata, asymmetric overgrowth, developmental delay, discrete midline facial nevus flammeus, toe syndactyly and postaxial polydactyly--thus, clearly an MCAP phenotype. Exome sequencing revealed a pathogenic de novo germline variant in the PTPN11 gene (c.1529A>G; p.(Gln510Arg)), which has so far been associated with Noonan, as well as LEOPARD syndrome. Whole-exome sequencing (>100 × coverage) did not reveal any alteration in the known megalencephaly genes. However, ultra-deep sequencing results from saliva (>1000 × coverage) revealed a 22% mosaic variant in PIK3CA (c.2740G>A; p.(Gly914Arg)). To our knowledge, this report is the first description of a PTPN11 germline variant in an MCAP patient. Data from experimental studies show a complex interaction of SHP2 (gene product of PTPN11) and the PI3K-AKT pathway. We hypothesize that certain PTPN11 germline variants might drive toward additional second-hit alterations.
- Published
- 2015
- Full Text
- View/download PDF
38. Further delineation of the SATB2 phenotype.
- Author
-
Döcker D, Schubach M, Menzel M, Munz M, Spaich C, Biskup S, and Bartholdi D
- Subjects
- Child, Preschool, Chromosome Deletion, Chromosomes, Human, Pair 2, Exome, Facies, Female, Gene Order, Genetic Loci, Genotype, Humans, Mutation, Sequence Analysis, DNA, Genetic Association Studies, Matrix Attachment Region Binding Proteins genetics, Phenotype, Transcription Factors genetics
- Abstract
SATB2 is an evolutionarily highly conserved chromatin remodeling gene located on chromosome 2q33.1. Vertebrate animal models have shown that Satb2 has a crucial role in craniofacial patterning and osteoblast differentiation, as well as in determining the fates of neuronal projections in the developing neocortex. In humans, chromosomal translocations and deletions of 2q33.1 leading to SATB2 haploinsufficiency are associated with cleft palate (CP), facial dysmorphism and intellectual disability (ID). A single patient carrying a nonsense mutation in SATB2 has been described to date. In this study, we performed trio-exome sequencing in a 3-year-old girl with CP and severely delayed speech development, and her unaffected parents. Previously, the girl had undergone conventional and molecular karyotyping (microarray analysis), as well as targeted analysis for different diseases associated with developmental delay, including Angelman syndrome, Rett syndrome and Fragile X syndrome. No diagnosis could be established. Exome sequencing revealed a de novo nonsense mutation in the SATB2 gene (c.715C>T; p.R239*). The identification of a second patient carrying a de novo nonsense mutation in SATB2 confirms that this gene is essential for normal craniofacial patterning and cognitive development. Based on our data and the literature published so far, we propose a new clinically recognizable syndrome - the SATB2-associated syndrome (SAS). SAS is likely to be underdiagnosed and should be considered in children with ID, severe speech delay, cleft or high-arched palate and abnormal dentition with crowded and irregularly shaped teeth.
- Published
- 2014
- Full Text
- View/download PDF
39. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge.
- Author
-
Brownstein CA, Beggs AH, Homer N, Merriman B, Yu TW, Flannery KC, DeChene ET, Towne MC, Savage SK, Price EN, Holm IA, Luquette LJ, Lyon E, Majzoub J, Neupert P, McCallie D Jr, Szolovits P, Willard HF, Mendelsohn NJ, Temme R, Finkel RS, Yum SW, Medne L, Sunyaev SR, Adzhubey I, Cassa CA, de Bakker PI, Duzkale H, Dworzyński P, Fairbrother W, Francioli L, Funke BH, Giovanni MA, Handsaker RE, Lage K, Lebo MS, Lek M, Leshchiner I, MacArthur DG, McLaughlin HM, Murray MF, Pers TH, Polak PP, Raychaudhuri S, Rehm HL, Soemedi R, Stitziel NO, Vestecka S, Supper J, Gugenmus C, Klocke B, Hahn A, Schubach M, Menzel M, Biskup S, Freisinger P, Deng M, Braun M, Perner S, Smith RJ, Andorf JL, Huang J, Ryckman K, Sheffield VC, Stone EM, Bair T, Black-Ziegelbein EA, Braun TA, Darbro B, DeLuca AP, Kolbe DL, Scheetz TE, Shearer AE, Sompallae R, Wang K, Bassuk AG, Edens E, Mathews K, Moore SA, Shchelochkov OA, Trapane P, Bossler A, Campbell CA, Heusel JW, Kwitek A, Maga T, Panzer K, Wassink T, Van Daele D, Azaiez H, Booth K, Meyer N, Segal MM, Williams MS, Tromp G, White P, Corsmeier D, Fitzgerald-Butt S, Herman G, Lamb-Thrush D, McBride KL, Newsom D, Pierson CR, Rakowsky AT, Maver A, Lovrečić L, Palandačić A, Peterlin B, Torkamani A, Wedell A, Huss M, Alexeyenko A, Lindvall JM, Magnusson M, Nilsson D, Stranneheim H, Taylan F, Gilissen C, Hoischen A, van Bon B, Yntema H, Nelen M, Zhang W, Sager J, Zhang L, Blair K, Kural D, Cariaso M, Lennon GG, Javed A, Agrawal S, Ng PC, Sandhu KS, Krishna S, Veeramachaneni V, Isakov O, Halperin E, Friedman E, Shomron N, Glusman G, Roach JC, Caballero J, Cox HC, Mauldin D, Ament SA, Rowen L, Richards DR, San Lucas FA, Gonzalez-Garay ML, Caskey CT, Bai Y, Huang Y, Fang F, Zhang Y, Wang Z, Barrera J, Garcia-Lobo JM, González-Lamuño D, Llorca J, Rodriguez MC, Varela I, Reese MG, De La Vega FM, Kiruluta E, Cargill M, Hart RK, Sorenson JM, Lyon GJ, Stevenson DA, Bray BE, Moore BM, Eilbeck K, Yandell M, Zhao H, Hou L, Chen X, Yan X, Chen M, Li C, Yang C, Gunel M, Li P, Kong Y, Alexander AC, Albertyn ZI, Boycott KM, Bulman DE, Gordon PM, Innes AM, Knoppers BM, Majewski J, Marshall CR, Parboosingh JS, Sawyer SL, Samuels ME, Schwartzentruber J, Kohane IS, and Margulies DM
- Subjects
- Child, Female, Financing, Organized, Genetic Testing economics, Genetic Testing standards, Genomics economics, Genomics standards, Heart Defects, Congenital diagnosis, Heart Defects, Congenital genetics, Humans, Male, Myopathies, Structural, Congenital diagnosis, Myopathies, Structural, Congenital genetics, Sequence Analysis, DNA economics, Sequence Analysis, DNA standards, Databases, Genetic standards, Genetic Testing methods, Genomics methods, Peer Review, Research, Sequence Analysis, DNA methods
- Abstract
Background: There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance., Results: A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization., Conclusions: The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.
- Published
- 2014
- Full Text
- View/download PDF
40. Panel-based next generation sequencing as a reliable and efficient technique to detect mutations in unselected patients with retinal dystrophies.
- Author
-
Glöckle N, Kohl S, Mohr J, Scheurenbrand T, Sprecher A, Weisschuh N, Bernd A, Rudolph G, Schubach M, Poloschek C, Zrenner E, Biskup S, Berger W, Wissinger B, and Neidhardt J
- Subjects
- ATP-Binding Cassette Transporters genetics, Exons, Extracellular Matrix Proteins genetics, Eye Proteins genetics, High-Throughput Nucleotide Sequencing, Humans, Mutation, Pedigree, Retinal Dystrophies etiology, Retinal Dystrophies pathology, Retinitis Pigmentosa etiology, Sequence Analysis, DNA, Usher Syndromes etiology, Usher Syndromes pathology, Genetic Predisposition to Disease, Pathology, Molecular, Retinal Dystrophies genetics, Retinitis Pigmentosa genetics, Usher Syndromes genetics
- Abstract
Hereditary retinal dystrophies (RD) constitute a group of blinding diseases that are characterized by clinical variability and pronounced genetic heterogeneity. The different forms of RD can be caused by mutations in >100 genes, including >1600 exons. Consequently, next generation sequencing (NGS) technologies are among the most promising approaches to identify mutations in RD. So far, NGS is not routinely used in gene diagnostics. We developed a diagnostic NGS pipeline to identify mutations in 170 genetically and clinically unselected RD patients. NGS was applied to 105 RD-associated genes. Underrepresented regions were examined by Sanger sequencing. The NGS approach was successfully established using cases with known sequence alterations. Depending on the initial clinical diagnosis, we identified likely causative mutations in 55% of retinitis pigmentosa and 80% of Bardet-Biedl or Usher syndrome cases. Seventy-one novel mutations in 40 genes were newly associated with RD. The genes USH2A, EYS, ABCA4, and RHO were more frequently affected than others. Occasionally, cases carried mutations in more than one RD-associated gene. In addition, we found possible dominant de-novo mutations in cases with sporadic RD, which implies consequences for counseling of patients and families. NGS-based mutation analyses are reliable and cost-efficient approaches in gene diagnostics of genetically heterogeneous diseases like RD.
- Published
- 2014
- Full Text
- View/download PDF
41. Targeted next generation sequencing as a diagnostic tool in epileptic disorders.
- Author
-
Lemke JR, Riesch E, Scheurenbrand T, Schubach M, Wilhelm C, Steiner I, Hansen J, Courage C, Gallati S, Bürki S, Strozzi S, Simonetti BG, Grunt S, Steinlin M, Alber M, Wolff M, Klopstock T, Prott EC, Lorenz R, Spaich C, Rona S, Lakshminarasimhan M, Kröll J, Dorn T, Krämer G, Synofzik M, Becker F, Weber YG, Lerche H, Böhm D, and Biskup S
- Subjects
- Adolescent, Adult, Child, Child, Preschool, Epilepsy diagnosis, Female, Genes genetics, Genetic Predisposition to Disease, Genotype, Humans, Male, Mutation genetics, Phenotype, Sequence Analysis, DNA, Tripeptidyl-Peptidase 1, Young Adult, Epilepsy genetics
- Abstract
Purpose: Epilepsies have a highly heterogeneous background with a strong genetic contribution. The variety of unspecific and overlapping syndromic and nonsyndromic phenotypes often hampers a clear clinical diagnosis and prevents straightforward genetic testing. Knowing the genetic basis of a patient's epilepsy can be valuable not only for diagnosis but also for guiding treatment and estimating recurrence risks., Methods: To overcome these diagnostic restrictions, we composed a panel of genes for Next Generation Sequencing containing the most relevant epilepsy genes and covering the most relevant epilepsy phenotypes known so far. With this method, 265 genes were analyzed per patient in a single step. We evaluated this panel on a pilot cohort of 33 index patients with concise epilepsy phenotypes or with a severe but unspecific seizure disorder covering both sporadic and familial cases., Key Findings: We identified presumed disease-causing mutations in 16 of 33 patients comprising sequence alterations in frequently as well as in less commonly affected genes. The detected aberrations encompassed known and unknown point mutations (SCN1A p.R222X, p. E289V, p.379R, p.R393H; SCN2A p.V208E; STXBP1 p.R122X; KCNJ10 p.L68P, p.I129V; KCTD7 p.L108M; KCNQ3 p.P574S; ARHGEF9 p.R290H; SMS p.F58L; TPP1 p.Q278R, p.Q422H; MFSD8 p.T294K), a putative splice site mutation (SCN1A c.693A> p.T/P231P) and small deletions (SCN1A p.F1330Lfs3X [1 bp]; MFSD8 p.A138Dfs10X [7 bp]). All mutations have been confirmed by conventional Sanger sequencing and, where possible, validated by parental testing and segregation analysis. In three patients with either Dravet syndrome or myoclonic epilepsy, we detected SCN1A mutations (p.R222X, p.P231P, p.R393H), even though other laboratories had previously excluded aberrations of this gene by Sanger sequencing or high-resolution melting analysis., Significance: We have developed a fast and cost-efficient diagnostic screening method to analyze the genetic basis of epilepsies. We were able to detect mutations in patients with clear and with unspecific epilepsy phenotypes, to uncover the genetic basis of many so far unresolved cases with epilepsy including mutation detection in cases in which previous conventional methods yielded falsely negative results. Our approach thus proved to be a powerful diagnostic tool that may contribute to collecting information on both common and unknown epileptic disorders and in delineating associated phenotypes of less frequently mutated genes., (Wiley Periodicals, Inc. © 2012 International League Against Epilepsy.)
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.