355 results on '"Segmental duplications"'
Search Results
2. A multilocus approach for accurate variant calling in low-copy repeats using whole-genome sequencing
- Author
-
Prodanov, Timofey and Bansal, Vikas
- Subjects
Human Genome ,Genetics ,2.1 Biological and endogenous factors ,Aetiology ,Generic health relevance ,Good Health and Well Being ,Humans ,Segmental Duplications ,Genomic ,DNA Copy Number Variations ,Whole Genome Sequencing ,Benchmarking ,Genome ,Human ,Mathematical Sciences ,Biological Sciences ,Information and Computing Sciences ,Bioinformatics - Abstract
MotivationLow-copy repeats (LCRs) or segmental duplications are long segments of duplicated DNA that cover > 5% of the human genome. Existing tools for variant calling using short reads exhibit low accuracy in LCRs due to ambiguity in read mapping and extensive copy number variation. Variants in more than 150 genes overlapping LCRs are associated with risk for human diseases.MethodsWe describe a short-read variant calling method, ParascopyVC, that performs variant calling jointly across all repeat copies and utilizes reads independent of mapping quality in LCRs. To identify candidate variants, ParascopyVC aggregates reads mapped to different repeat copies and performs polyploid variant calling. Subsequently, paralogous sequence variants that can differentiate repeat copies are identified using population data and used for estimating the genotype of variants for each repeat copy.ResultsOn simulated whole-genome sequence data, ParascopyVC achieved higher precision (0.997) and recall (0.807) than three state-of-the-art variant callers (best precision = 0.956 for DeepVariant and best recall = 0.738 for GATK) in 167 LCR regions. Benchmarking of ParascopyVC using the genome-in-a-bottle high-confidence variant calls for HG002 genome showed that it achieved a very high precision of 0.991 and a high recall of 0.909 across LCR regions, significantly better than FreeBayes (precision = 0.954 and recall = 0.822), GATK (precision = 0.888 and recall = 0.873) and DeepVariant (precision = 0.983 and recall = 0.861). ParascopyVC demonstrated a consistently higher accuracy (mean F1 = 0.947) than other callers (best F1 = 0.908) across seven human genomes.Availability and implementationParascopyVC is implemented in Python and is freely available at https://github.com/tprodanov/ParascopyVC.
- Published
- 2023
3. Increased mutation and gene conversion within human segmental duplications
- Author
-
Vollger, Mitchell R, Dishuck, Philip C, Harvey, William T, DeWitt, William S, Guitart, Xavi, Goldberg, Michael E, Rozanski, Allison N, Lucas, Julian, Asri, Mobin, Munson, Katherine M, Lewis, Alexandra P, Hoekzema, Kendra, Logsdon, Glennis A, Porubsky, David, Paten, Benedict, Harris, Kelley, Hsieh, PingHsun, and Eichler, Evan E
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,2.1 Biological and endogenous factors ,Humans ,Gene Conversion ,Genome ,Human ,Mutation ,Segmental Duplications ,Genomic ,Polymorphism ,Single Nucleotide ,Haplotypes ,Exons ,Cytosine ,Guanine ,CpG Islands ,Human Pangenome Reference Consortium ,General Science & Technology - Abstract
Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.
- Published
- 2023
4. Gaps and complex structurally variant loci in phased genome assemblies
- Author
-
Porubsky, David, Vollger, Mitchell R, Harvey, William T, Rozanski, Allison N, Ebert, Peter, Hickey, Glenn, Hasenfeld, Patrick, Sanders, Ashley D, Stober, Catherine, Consortium, Human Pangenome Reference, Korbel, Jan O, Paten, Benedict, Marschall, Tobias, Eichler, Evan E, Abel, Haley J, Antonacci-Fulton, Lucinda L, Asri, Mobin, Baid, Gunjan, Baker, Carl A, Belyaeva, Anastasiya, Billis, Konstantinos, Bourque, Guillaume, Buonaiuto, Silvia, Carroll, Andrew, Chaisson, Mark JP, Chang, Pi-Chuan, Chang, Xian H, Cheng, Haoyu, Chu, Justin, Cody, Sarah, Colonna, Vincenza, Cook, Daniel E, Cook-Deegan, Robert M, Cornejo, Omar E, Diekhans, Mark, Doerr, Daniel, Ebler, Jana, Eizenga, Jordan M, Fairley, Susan, Fedrigo, Olivier, Felsenfeld, Adam L, Feng, Xiaowen, Fischer, Christian, Flicek, Paul, Formenti, Giulio, Frankish, Adam, Fulton, Robert S, Gao, Yan, Garg, Shilpa, Garrison, Erik, Garrison, Nanibaa’ A, Giron, Carlos Garcia, Green, Richard E, Groza, Cristian, Guarracino, Andrea, Haggerty, Leanne, Hall, Ira M, Haukness, Marina, Haussler, David, Heumos, Simon, Hoekzema, Kendra, Hourlier, Thibaut, Howe, Kerstin, Jain, Miten, Jarvis, Erich D, Ji, Hanlee P, Kenny, Eimear E, Koenig, Barbara A, Kolesnikov, Alexey, Kordosky, Jennifer, Koren, Sergey, Lee, HoJoon, Lewis, Alexandra P, Li, Heng, Liao, Wen-Wei, Lu, Shuangjia, Lu, Tsung-Yu, Lucas, Julian K, Magalhães, Hugo, Marco-Sola, Santiago, Marijon, Pierre, Markello, Charles, Martin, Fergal J, McCartney, Ann, McDaniel, Jennifer, Miga, Karen H, Mitchell, Matthew W, Monlong, Jean, Mountcastle, Jacquelyn, Munson, Katherine M, Mwaniki, Moses Njagi, Nattestad, Maria, Novak, Adam M, and Nurk, Sergey
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Human Genome ,Humans ,DNA ,Satellite ,Polymorphism ,Genetic ,Haplotypes ,Segmental Duplications ,Genomic ,Sequence Analysis ,DNA ,Human Pangenome Reference Consortium ,Medical and Health Sciences ,Bioinformatics - Abstract
There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.
- Published
- 2023
5. A groundbreaking achievement in genomics: the complete telomere-to-telomere sequence of a mouse genome
- Author
-
Zhu, Wei-Guo
- Published
- 2025
- Full Text
- View/download PDF
6. High level of complexity and global diversity of the 3q29 locus revealed by optical mapping and long-read sequencing
- Author
-
Yilmaz, Feyza, Gurusamy, Umamaheswaran, Mosley, Trenell J, Hallast, Pille, Kim, Kwondo, Mostovoy, Yulia, Purcell, Ryan H, Shaikh, Tamim H, Zwick, Michael E, Kwok, Pui-Yan, Lee, Charles, and Mulle, Jennifer G
- Subjects
Biological Sciences ,Genetics ,Clinical Research ,Human Genome ,Biotechnology ,2.1 Biological and endogenous factors ,Humans ,Segmental Duplications ,Genomic ,Chromosome Mapping ,Genomics ,Syndrome ,Haplotypes ,DNA Copy Number Variations ,3q29 ,Structural variations ,Genomic disorders ,Schizophrenia ,NAHR ,Copy number variant(s) ,Clinical Sciences - Abstract
BackgroundHigh sequence identity between segmental duplications (SDs) can facilitate copy number variants (CNVs) via non-allelic homologous recombination (NAHR). These CNVs are one of the fundamental causes of genomic disorders such as the 3q29 deletion syndrome (del3q29S). There are 21 protein-coding genes lost or gained as a result of such recurrent 1.6-Mbp deletions or duplications, respectively, in the 3q29 locus. While NAHR plays a role in CNV occurrence, the factors that increase the risk of NAHR at this particular locus are not well understood.MethodsWe employed an optical genome mapping technique to characterize the 3q29 locus in 161 unaffected individuals, 16 probands with del3q29S and their parents, and 2 probands with the 3q29 duplication syndrome (dup3q29S). Long-read sequencing-based haplotype resolved de novo assemblies from 44 unaffected individuals, and 1 trio was used for orthogonal validation of haplotypes and deletion breakpoints.ResultsIn total, we discovered 34 haplotypes, of which 19 were novel haplotypes. Among these 19 novel haplotypes, 18 were detected in unaffected individuals, while 1 novel haplotype was detected on the parent-of-origin chromosome of a proband with the del3q29S. Phased assemblies from 44 unaffected individuals enabled the orthogonal validation of 20 haplotypes. In 89% (16/18) of the probands, breakpoints were confined to paralogous copies of a 20-kbp segment within the 3q29 SDs. In one del3q29S proband, the breakpoint was confined to a 374-bp region using long-read sequencing. Furthermore, we categorized del3q29S cases into three classes and dup3q29S cases into two classes based on breakpoints. Finally, we found no evidence of inversions in parent-of-origin chromosomes.ConclusionsWe have generated the most comprehensive haplotype map for the 3q29 locus using unaffected individuals, probands with del3q29S or dup3q29S, and available parents, and also determined the deletion breakpoint to be within a 374-bp region in one proband with del3q29S. These results should provide a better understanding of the underlying genetic architecture that contributes to the etiology of del3q29S and dup3q29S.
- Published
- 2023
7. A haplotype-resolved genome assembly of the Nile rat facilitates exploration of the genetic basis of diabetes.
- Author
-
Toh, Huishi, Yang, Chentao, Formenti, Giulio, Raja, Kalpana, Yan, Lily, Tracey, Alan, Chow, William, Howe, Kerstin, Bergeron, Lucie, Zhang, Guojie, Haase, Bettina, Mountcastle, Jacquelyn, Fedrigo, Olivier, Fogg, John, Kirilenko, Bogdan, Munegowda, Chetan, Hiller, Michael, Jain, Aashish, Kihara, Daisuke, Rhie, Arang, Phillippy, Adam, Swanson, Scott, Jiang, Peng, Jarvis, Erich, Thomson, James, Stewart, Ron, Chaisson, Mark, Bukhman, Yury, and Clegg, Dennis
- Subjects
Arvicanthis niloticus ,Diabetes ,Diurnal ,Genome ,Germline mutation rate ,Heterozygosity ,Long-read genome assembly ,Orthology ,Positive selection ,Retrogenes ,Segmental duplications ,Humans ,Animals ,Haplotypes ,Diabetes Mellitus ,Type 2 ,Murinae ,Genome ,Genomics - Abstract
BACKGROUND: The Nile rat (Avicanthis niloticus) is an important animal model because of its robust diurnal rhythm, a cone-rich retina, and a propensity to develop diet-induced diabetes without chemical or genetic modifications. A closer similarity to humans in these aspects, compared to the widely used Mus musculus and Rattus norvegicus models, holds the promise of better translation of research findings to the clinic. RESULTS: We report a 2.5 Gb, chromosome-level reference genome assembly with fully resolved parental haplotypes, generated with the Vertebrate Genomes Project (VGP). The assembly is highly contiguous, with contig N50 of 11.1 Mb, scaffold N50 of 83 Mb, and 95.2% of the sequence assigned to chromosomes. We used a novel workflow to identify 3613 segmental duplications and quantify duplicated genes. Comparative analyses revealed unique genomic features of the Nile rat, including some that affect genes associated with type 2 diabetes and metabolic dysfunctions. We discuss 14 genes that are heterozygous in the Nile rat or highly diverged from the house mouse. CONCLUSIONS: Our findings reflect the exceptional level of genomic resolution present in this assembly, which will greatly expand the potential of the Nile rat as a model organism.
- Published
- 2022
8. Segmental duplications drive the evolution of accessory regions in a major crop pathogen.
- Author
-
van Westerhoven, Anouk C., Aguilera‐Galvez, Carolina, Nakasato‐Tagami, Giuliana, Shi‐Kunne, Xiaoqian, Martinez de la Parte, Einar, Chavarro‐Carrero, Edgar, Meijer, Harold J. G., Feurtey, Alice, Maryani, Nani, Ordóñez, Nadia, Schneiders, Harrie, Nijbroek, Koen, Wittenberg, Alexander H. J., Hofstede, Rene, García‐Bastidas, Fernando, Sørensen, Anker, Swennen, Ronny, Drenth, Andre, Stukenbrock, Eva H., and Kema, Gert H. J.
- Subjects
- *
FUSARIUM wilt of banana , *BANANAS , *PHYTOPATHOGENIC microorganisms , *FUSARIUM oxysporum , *PAN-genome , *CHROMOSOMES - Abstract
Summary: Many pathogens evolved compartmentalized genomes with conserved core and variable accessory regions (ARs) that carry effector genes mediating virulence. The fungal plant pathogen Fusarium oxysporum has such ARs, often spanning entire chromosomes. The presence of specific ARs influences the host range, and horizontal transfer of ARs can modify the pathogenicity of the receiving strain. However, how these ARs evolve in strains that infect the same host remains largely unknown.We defined the pan‐genome of 69 diverse F. oxysporum strains that cause Fusarium wilt of banana, a significant constraint to global banana production, and analyzed the diversity and evolution of the ARs.Accessory regions in F. oxysporum strains infecting the same banana cultivar are highly diverse, and we could not identify any shared genomic regions and in planta‐induced effectors. We demonstrate that segmental duplications drive the evolution of ARs. Furthermore, we show that recent segmental duplications specifically in accessory chromosomes cause the expansion of ARs in F. oxysporum.Taken together, we conclude that extensive recent duplications drive the evolution of ARs in F. oxysporum, which contribute to the evolution of virulence. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. A High-Quality Blue Whale Genome, Segmental Duplications, and Historical Demography.
- Author
-
Bukhman, Yury V, Morin, Phillip A, Meyer, Susanne, Chu, Li-Fang, Jacobsen, Jeff K, Antosiewicz-Bourget, Jessica, Mamott, Daniel, Gonzales, Maylie, Argus, Cara, Bolin, Jennifer, Berres, Mark E, Fedrigo, Olivier, Steill, John, Swanson, Scott A, Jiang, Peng, Rhie, Arang, Formenti, Giulio, Phillippy, Adam M, Harris, Robert S, and Wood, Jonathan M D
- Subjects
BLUE whale ,DEMOGRAPHY ,GENOME size ,GENOMES ,INTERGLACIALS ,HETEROZYGOSITY - Abstract
The blue whale, Balaenoptera musculus , is the largest animal known to have ever existed, making it an important case study in longevity and resistance to cancer. To further this and other blue whale-related research, we report a reference-quality, long-read-based genome assembly of this fascinating species. We assembled the genome from PacBio long reads and utilized Illumina/10×, optical maps, and Hi-C data for scaffolding, polishing, and manual curation. We also provided long read RNA-seq data to facilitate the annotation of the assembly by NCBI and Ensembl. Additionally, we annotated both haplotypes using TOGA and measured the genome size by flow cytometry. We then compared the blue whale genome with other cetaceans and artiodactyls, including vaquita (Phocoena sinus), the world's smallest cetacean, to investigate blue whale's unique biological traits. We found a dramatic amplification of several genes in the blue whale genome resulting from a recent burst in segmental duplications, though the possible connection between this amplification and giant body size requires further study. We also discovered sites in the insulin-like growth factor-1 gene correlated with body size in cetaceans. Finally, using our assembly to examine the heterozygosity and historical demography of Pacific and Atlantic blue whale populations, we found that the genomes of both populations are highly heterozygous and that their genetic isolation dates to the last interglacial period. Taken together, these results indicate how a high-quality, annotated blue whale genome will serve as an important resource for biology, evolution, and conservation research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing
- Author
-
Prodanov, Timofey and Bansal, Vikas
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Prevention ,Biodefense ,Biotechnology ,Human Genome ,Vaccine Related ,Generic health relevance ,DNA Copy Number Variations ,Genome ,Human ,High-Throughput Nucleotide Sequencing ,Humans ,Segmental Duplications ,Genomic ,Sequence Analysis ,DNA ,Whole Genome Sequencing - Abstract
The human genome contains hundreds of low-copy repeats (LCRs) that are challenging to analyze using short-read sequencing technologies due to extensive copy number variation and ambiguity in read mapping. Copy number and sequence variants in more than 150 duplicated genes that overlap LCRs have been implicated in monogenic and complex human diseases. We describe a computational tool, Parascopy, for estimating the aggregate and paralog-specific copy number of duplicated genes using whole-genome sequencing (WGS). Parascopy is an efficient method that jointly analyzes reads mapped to different repeat copies without the need for global realignment. It leverages multiple samples to mitigate sequencing bias and to identify reliable paralogous sequence variants (PSVs) that differentiate repeat copies. Analysis of WGS data for 2504 individuals from diverse populations showed that Parascopy is robust to sequencing bias, has higher accuracy compared to existing methods and enables prioritization of pathogenic copy number changes in duplicated genes.
- Published
- 2022
11. PhaseDancer: a novel targeted assembler of segmental duplications unravels the complexity of the human chromosome 2 fusion going from 48 to 46 chromosomes in hominin evolution
- Author
-
Barbara Poszewiecka, Krzysztof Gogolewski, Justyna A. Karolak, Paweł Stankiewicz, and Anna Gambin
- Subjects
De-novo assembly ,Segmental duplications ,Long-read PacBio sequencing ,Chromosomal fusion ,Complex genomic rearrangements ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Resolving complex genomic regions rich in segmental duplications (SDs) is challenging due to the high error rate of long-read sequencing. Here, we describe a targeted approach with a novel genome assembler PhaseDancer that extends SD-rich regions of interest iteratively. We validate its robustness and efficiency using a golden-standard set of human BAC clones and in silico-generated SDs with predefined evolutionary scenarios. PhaseDancer enables extension of the incomplete complex SD-rich subtelomeric regions of Great Ape chromosomes orthologous to the human chromosome 2 (HSA2) fusion site, informing a model of HSA2 formation and unravelling the evolution of human and Great Ape genomes.
- Published
- 2023
- Full Text
- View/download PDF
12. Structural Variation Evolution at the 15q11-q13 Disease-Associated Locus.
- Author
-
Paparella, Annalisa, L'Abbate, Alberto, Palmisano, Donato, Chirico, Gerardina, Porubsky, David, Catacchio, Claudia R., Ventura, Mario, Eichler, Evan E., Maggiolini, Flavia A. M., and Antonacci, Francesca
- Subjects
- *
RECURRENT neural networks , *DEVELOPMENTAL delay , *LOCUS (Genetics) , *DNA copy number variations , *HUMAN evolution - Abstract
The impact of segmental duplications on human evolution and disease is only just starting to unfold, thanks to advancements in sequencing technologies that allow for their discovery and precise genotyping. The 15q11-q13 locus is a hotspot of recurrent copy number variation associated with Prader–Willi/Angelman syndromes, developmental delay, autism, and epilepsy and is mediated by complex segmental duplications, many of which arose recently during evolution. To gain insight into the instability of this region, we characterized its architecture in human and nonhuman primates, reconstructing the evolutionary history of five different inversions that rearranged the region in different species primarily by accumulation of segmental duplications. Comparative analysis of human and nonhuman primate duplication structures suggests a human-specific gain of directly oriented duplications in the regions flanking the GOLGA cores and HERC segmental duplications, representing potential genomic drivers for the human-specific expansions. The increasing complexity of segmental duplication organization over the course of evolution underlies its association with human susceptibility to recurrent disease-associated rearrangements. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. Diverse Molecular Mechanisms Contribute to Differential Expression of Human Duplicated Genes
- Author
-
Shew, Colin J, Carmona-Mora, Paulina, Soto, Daniela C, Mastoras, Mira, Roberts, Elizabeth, Rosas, Joseph, Jagannathan, Dhriti, Kaya, Gulhan, O’Geen, Henriette, and Dennis, Megan Y
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Human Genome ,1.1 Normal biological development and functioning ,Generic health relevance ,Animals ,Cell Line ,DNA Copy Number Variations ,Gene Duplication ,Gene Expression Regulation ,Genome ,Human ,Humans ,Pan troglodytes ,Promoter Regions ,Genetic ,Segmental Duplications ,Genomic ,gene duplication ,gene regulation ,primate evolution ,Biochemistry and Cell Biology ,Evolutionary Biology ,Biochemistry and cell biology ,Evolutionary biology - Abstract
Emerging evidence links genes within human-specific segmental duplications (HSDs) to traits and diseases unique to our species. Strikingly, despite being nearly identical by sequence (>98.5%), paralogous HSD genes are differentially expressed across human cell and tissue types, though the underlying mechanisms have not been examined. We compared cross-tissue mRNA levels of 75 HSD genes from 30 families between humans and chimpanzees and found expression patterns consistent with relaxed selection on or neofunctionalization of derived paralogs. In general, ancestral paralogs exhibited greatest expression conservation with chimpanzee orthologs, though exceptions suggest certain derived paralogs may retain or supplant ancestral functions. Concordantly, analysis of long-read isoform sequencing data sets from diverse human tissues and cell lines found that about half of derived paralogs exhibited globally lower expression. To understand mechanisms underlying these differences, we leveraged data from human lymphoblastoid cell lines (LCLs) and found no relationship between paralogous expression divergence and post-transcriptional regulation, sequence divergence, or copy-number variation. Considering cis-regulation, we reanalyzed ENCODE data and recovered hundreds of previously unidentified candidate CREs in HSDs. We also generated large-insert ChIP-sequencing data for active chromatin features in an LCL to better distinguish paralogous regions. Some duplicated CREs were sufficient to drive differential reporter activity, suggesting they may contribute to divergent cis-regulation of paralogous genes. This work provides evidence that cis-regulatory divergence contributes to novel expression patterns of recent gene duplicates in humans.
- Published
- 2021
14. Genomic regions associated with microdeletion/microduplication syndromes exhibit extreme diversity of structural variation
- Author
-
Mostovoy, Yulia, Yilmaz, Feyza, Chow, Stephen K, Chu, Catherine, Lin, Chin, Geiger, Elizabeth A, Meeks, Naomi JL, Chatfield, Kathryn C, Coughlin, Curtis R, Surti, Urvashi, Kwok, Pui-Yan, and Shaikh, Tamim H
- Subjects
Biological Sciences ,Genetics ,Biotechnology ,1.1 Normal biological development and functioning ,Chromosome Breakpoints ,Chromosome Deletion ,Chromosome Disorders ,Chromosomes ,Human ,Pair 15 ,Chromosomes ,Human ,Pair 16 ,Craniofacial Abnormalities ,Developmental Disabilities ,Genomic Structural Variation ,Heart Defects ,Congenital ,Humans ,Intellectual Disability ,Mental Disorders ,Segmental Duplications ,Genomic ,Seizures ,Williams Syndrome ,segmental duplications ,genome mapping ,structural variation ,genomic disorders ,Developmental Biology ,Biochemistry and cell biology - Abstract
Segmental duplications (SDs) are a class of long, repetitive DNA elements whose paralogs share a high level of sequence similarity with each other. SDs mediate chromosomal rearrangements that lead to structural variation in the general population as well as genomic disorders associated with multiple congenital anomalies, including the 7q11.23 (Williams-Beuren Syndrome, WBS), 15q13.3, and 16p12.2 microdeletion syndromes. Population-level characterization of SDs has generally been lacking because most techniques used for analyzing these complex regions are both labor and cost intensive. In this study, we have used a high-throughput technique to genotype complex structural variation with a single molecule, long-range optical mapping approach. We characterized SDs and identified novel structural variants (SVs) at 7q11.23, 15q13.3, and 16p12.2 using optical mapping data from 154 phenotypically normal individuals from 26 populations comprising five super-populations. We detected several novel SVs for each locus, some of which had significantly different prevalence between populations. Additionally, we localized the microdeletion breakpoints to specific paralogous duplicons located within complex SDs in two patients with WBS, one patient with 15q13.3, and one patient with 16p12.2 microdeletion syndromes. The population-level data presented here highlights the extreme diversity of large and complex SVs within SD-containing regions. The approach we outline will greatly facilitate the investigation of the role of inter-SD structural variation as a driver of chromosomal rearrangements and genomic disorders.
- Published
- 2021
15. Genomic Complexity and Complex Chromosomal Rearrangements in Genetic Diagnosis: Two Illustrative Cases on Chromosome 7.
- Author
-
Villa, Nicoletta, Redaelli, Serena, Farina, Stefania, Conconi, Donatella, Sala, Elena Maria, Crosti, Francesca, Mariani, Silvana, Colombo, Carla Maria, Dalprà, Leda, Lavitrano, Marialuisa, Bentivegna, Angela, and Roversi, Gaia
- Subjects
- *
CHROMOSOMAL rearrangement , *CHROMOSOMES , *GENETIC disorder diagnosis , *CHROMOSOME segregation , *KARYOTYPES , *TRISOMY , *CYTOGENETICS - Abstract
Complex chromosomal rearrangements are rare events compatible with survival, consisting of an imbalance and/or position effect of one or more genes, that contribute to a range of clinical presentations. The investigation and diagnosis of these cases are often difficult. The interpretation of the pattern of pairing and segregation of these chromosomes during meiosis is important for the assessment of the risk and the type of imbalance in the offspring. Here, we investigated two unrelated pediatric carriers of complex rearrangements of chromosome 7. The first case was a 2-year-old girl with a severe phenotype. Conventional cytogenetics evidenced a duplication of part of the short arm of chromosome 7. By array-CGH analysis, we found a complex rearrangement with three discontinuous trisomy regions (7p22.1p21.3, 7p21.3, and 7p21.3p15.3). The second case was a newborn investigated for hypodevelopment and dimorphisms. The karyotype analysis promptly revealed a structurally altered chromosome 7. The array-CGH analysis identified an even more complex rearrangement consisting of a trisomic region at 7q11.23q22 and a tetrasomic region of 4.5 Mb spanning 7q21.3 to q22.1. The mother's karyotype examination revealed a complex rearrangement of chromosome 7: the 7q11.23q22 region was inserted in the short arm at 7p15.3. Finally, array-CGH analysis showed a trisomic region that corresponds to the tetrasomic region of the son. Our work proved that the integration of several technical solutions is often required to appropriately analyze complex chromosomal rearrangements in order to understand their implications and offer appropriate genetic counseling. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications
- Author
-
Prodanov, Timofey and Bansal, Vikas
- Subjects
Human Genome ,Genetics ,Generic health relevance ,Algorithms ,Databases ,Genetic ,Datasets as Topic ,Genome ,Human ,High-Throughput Nucleotide Sequencing ,Humans ,Segmental Duplications ,Genomic ,Sequence Analysis ,DNA ,Software ,Environmental Sciences ,Biological Sciences ,Information and Computing Sciences ,Developmental Biology - Abstract
The ability to characterize repetitive regions of the human genome is limited by the read lengths of short-read sequencing technologies. Although long-read sequencing technologies such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies can potentially overcome this limitation, long segmental duplications with high sequence identity pose challenges for long-read mapping. We describe a probabilistic method, DuploMap, designed to improve the accuracy of long-read mapping in segmental duplications. It analyzes reads mapped to segmental duplications using existing long-read aligners and leverages paralogous sequence variants (PSVs)-sequence differences between paralogous sequences-to distinguish between multiple alignment locations. On simulated datasets, DuploMap increased the percentage of correctly mapped reads with high confidence for multiple long-read aligners including Minimap2 (74.3-90.6%) and BLASR (82.9-90.7%) while maintaining high precision. Across multiple whole-genome long-read datasets, DuploMap aligned an additional 8-21% of the reads in segmental duplications with high confidence relative to Minimap2. Using DuploMap-aligned PacBio circular consensus sequencing reads, an additional 8.9 Mb of DNA sequence was mappable, variant calling achieved a higher F1 score and 14 713 additional variants supported by linked-read data were identified. Finally, we demonstrate that a significant fraction of PSVs in segmental duplications overlaps with variants and adversely impacts short-read variant calling.
- Published
- 2020
17. Evolution of Human Brain Size-Associated NOTCH2NL Genes Proceeds toward Reduced Protein Levels.
- Author
-
Lodewijk, Gerrald, Fernandes, Diana, Vretzakis, Iraklis, Savage, Jeanne, and Jacobs, Frank
- Subjects
Neanderthal ,archaic genomes ,brain size ,gene conversion ,human evolutionary genomics ,human-specific genes ,segmental duplications ,Animals ,Biological Evolution ,Genome ,Human ,Genomic Structural Variation ,Humans ,Multigene Family ,Neanderthals ,Receptor ,Notch2 - Abstract
Ever since the availability of genomes from Neanderthals, Denisovans, and ancient humans, the field of evolutionary genomics has been searching for protein-coding variants that may hold clues to how our species evolved over the last ∼600,000 years. In this study, we identify such variants in the human-specific NOTCH2NL gene family, which were recently identified as possible contributors to the evolutionary expansion of the human brain. We find evidence for the existence of unique protein-coding NOTCH2NL variants in Neanderthals and Denisovans which could affect their ability to activate Notch signaling. Furthermore, in the Neanderthal and Denisovan genomes, we find unusual NOTCH2NL configurations, not found in any of the modern human genomes analyzed. Finally, genetic analysis of archaic and modern humans reveals ongoing adaptive evolution of modern human NOTCH2NL genes, identifying three structural variants acting complementary to drive our genome to produce a lower dosage of NOTCH2NL protein. Because copy-number variations of the 1q21.1 locus, encompassing NOTCH2NL genes, are associated with severe neurological disorders, this seemingly contradicting drive toward low levels of NOTCH2NL protein indicates that the optimal dosage of NOTCH2NL may have not yet been settled in the human population.
- Published
- 2020
18. Hi-C Identifies Complex Genomic Rearrangements and TAD-Shuffling in Developmental Diseases
- Author
-
Melo, Uirá Souto, Schöpflin, Robert, Acuna-Hidalgo, Rocio, Mensah, Martin Atta, Fischer-Zirnsak, Björn, Holtgrewe, Manuel, Klever, Marius-Konstantin, Türkmen, Seval, Heinrich, Verena, Pluym, Ilina Datkhaeva, Matoso, Eunice, Bernardo de Sousa, Sérgio, Louro, Pedro, Hülsemann, Wiebke, Cohen, Monika, Dufke, Andreas, Latos-Bieleńska, Anna, Vingron, Martin, Kalscheuer, Vera, Quintero-Rivera, Fabiola, Spielmann, Malte, and Mundlos, Stefan
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Biomedical and Clinical Sciences ,Genetics ,Biotechnology ,Human Genome ,Chromatin Assembly and Disassembly ,Chromosome Breakpoints ,Chromosomes ,Human ,Cohort Studies ,Developmental Disabilities ,Genome ,Human ,Humans ,Molecular Conformation ,SOX9 Transcription Factor ,Segmental Duplications ,Genomic ,Translocation ,Genetic ,Hi-C ,chromosome conformation capture ,cytogenetics ,developmental disorders ,ectopic enhancer-promoter interactions ,gene misregulation ,neo-TAD ,topologically associating domains ,Medical and Health Sciences ,Genetics & Heredity ,Biological sciences ,Biomedical and clinical sciences ,Health sciences - Abstract
Genome-wide analysis methods, such as array comparative genomic hybridization (CGH) and whole-genome sequencing (WGS), have greatly advanced the identification of structural variants (SVs) in the human genome. However, even with standard high-throughput sequencing techniques, complex rearrangements with multiple breakpoints are often difficult to resolve, and predicting their effects on gene expression and phenotype remains a challenge. Here, we address these problems by using high-throughput chromosome conformation capture (Hi-C) generated from cultured cells of nine individuals with developmental disorders (DDs). Three individuals had previously been identified as harboring duplications at the SOX9 locus and six had been identified with translocations. Hi-C resolved the positions of the duplications and was instructive in interpreting their distinct pathogenic effects, including the formation of new topologically associating domains (neo-TADs). Hi-C was very sensitive in detecting translocations, and it revealed previously unrecognized complex rearrangements at the breakpoints. In several cases, we observed the formation of fused-TADs promoting ectopic enhancer-promoter interactions that were likely to be involved in the disease pathology. In summary, we show that Hi-C is a sensible method for the detection of complex SVs in a clinical setting. The results help interpret the possible pathogenic effects of the SVs in individuals with DDs.
- Published
- 2020
19. Low copy repeats in the genome: from neglected to respected
- Author
-
Lisanne Vervoort and Joris R. Vermeesch
- Subjects
genomic disorders ,low copy repeats ,segmental duplications ,Other systems of medicine ,RZ201-999 - Abstract
DNA paralogs that have a length of at least 1 kilobase (kb) and are duplicated with a sequence identity of over 90% are classified as low copy repeats (LCRs) or segmental duplications (SDs). They constitute 6.6% of the genome and are clustering in specific genomic loci. Due to the high sequence homology between these duplicated regions, they can misalign during meiosis resulting in non-allelic homologous recombination (NAHR) and leading to structural variation such as deletions, duplications, inversions, and translocations. When such rearrangements result in a clinical phenotype, they are categorized as a genomic disorder. The presence of multiple copies of larger genomic segments offers opportunities for evolution. First, the creation of new genes in the human lineage will lead to human-specific traits and adaptation. Second, LCR variation between human populations can give rise to phenotypic variability. Hence, the rearrangement predisposition associated with LCRs should be interpreted in the context of the evolutionary advantages.
- Published
- 2023
- Full Text
- View/download PDF
20. NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads
- Author
-
Eleni Adam, Desh Ranjan, and Harold Riethman
- Subjects
Telomeres ,Subtelomeres ,Segmental duplications ,Tandem repeats ,Hybrid assembly ,Nanopore ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Human subtelomeric DNA regulates the length and stability of adjacent telomeres that are critical for cellular function, and contains many gene/pseudogene families. Large evolutionarily recent segmental duplications and associated structural variation in human subtelomeres has made complete sequencing and assembly of these regions difficult to impossible for many loci, complicating or precluding a wide range of genetic analyses to investigate their function. Results We present a hybrid assembly method, NanoPore Guided REgional Assembly Tool (NPGREAT), which combines Linked-Read data with mapped ultralong nanopore reads spanning subtelomeric segmental duplications to potentially overcome these difficulties. Linked-Read sets of DNA sequences identified by matches with 1-copy subtelomere sequence adjacent to segmental duplications are assembled and extended into the segmental duplication regions using Regional Extension of Assemblies using Linked-Reads (REXTAL). Mapped telomere-containing ultralong nanopore reads are then used to provide contiguity and correct orientation for matching REXTAL sequence contigs as well as identification/correction of any misassemblies. Our method was tested for a subset of representative subtelomeres with ultralong nanopore read coverage in the haploid human cell line CHM13. A 10X Linked-Read dataset from CHM13 was combined with ultralong nanopore reads from the same genome to provide improved subtelomere assemblies. Comparison of Nanopore-only assemblies using SHASTA with our NPGREAT assemblies in the distal-most subtelomere regions showed that NPGREAT produced higher-quality and more complete assemblies than SHASTA alone when these regions had low ultralong nanopore coverage (such as cases where large segmental duplications were immediately adjacent to (TTAGGG) tracts). Conclusion In genomic regions with large segmental duplications adjacent to telomeres, NPGREAT offers an alternative economical approach to improving assembly accuracy and coverage using linked-read datasets when more expensive HiFi datasets of 10–20 kb reads are unavailable.
- Published
- 2022
- Full Text
- View/download PDF
21. Complete Sequence of the 22q11.2 Allele in 1,053 Subjects with 22q11.2 Deletion Syndrome Reveals Modifiers of Conotruncal Heart Defects
- Author
-
Zhao, Yingjie, Diacou, Alexander, Johnston, H Richard, Musfee, Fadi I, McDonald-McGinn, Donna M, McGinn, Daniel, Crowley, T Blaine, Repetto, Gabriela M, Swillen, Ann, Breckpot, Jeroen, Vermeesch, Joris R, Kates, Wendy R, Digilio, M Cristina, Unolt, Marta, Marino, Bruno, Pontillo, Maria, Armando, Marco, Di Fabio, Fabio, Vicari, Stefano, van den Bree, Marianne, Moss, Hayley, Owen, Michael J, Murphy, Kieran C, Murphy, Clodagh M, Murphy, Declan, Schoch, Kelly, Shashi, Vandana, Tassone, Flora, Simon, Tony J, Shprintzen, Robert J, Campbell, Linda, Philip, Nicole, Heine-Suñer, Damian, García-Miñaúr, Sixto, Fernández, Luis, Consortium, International 22q11 2 Brain and Behavior, Antonarakis, Stylianos E, Biondi, Massimo, Boot, Erik, Breetvelt, Elemi, Busa, Tiffany, Butcher, Nancy, Buzzanca, Antonino, Carmel, Miri, Cleynen, Isabelle, Cutler, David, Dallapiccola, Bruno, de la Fuente Sanches, María Angeles, Epstein, Michael P, Evers, Rens, Fernandez, Luis, Fritsch, Rosemarie, Algas, Fernando García, Guo, Tingwei, Gur, Raquel, Hestand, Matthew S, Heung, Tracy, Hooper, Stephen, Jin, Andrea, Kushan-Wells, Leila, Laorden-Nieto, Alejandra Teresa, Lattanzi, Guido, Marshall, Christian, McCabe, Kathryn, Michaelovsky, Elena, Ornstein, Claudia, Silversides, Candice, Tran, Oanh, van Duin, Esther DA, Vergaelen, Elfi, Warren, Steve T, Weinberger, Ronnie, Weizman, Abraham, Zhang, Zhengdong, Zwick, Michael, Bearden, Carrie E, Vingerhoets, Claudia, van Amelsvoort, Therese, Eliez, Stephan, Schneider, Maude, Vorstman, Jacob AS, Gothelf, Doron, Zackai, Elaine, Agopian, AJ, Gur, Raquel E, Bassett, Anne S, Emanuel, Beverly S, Goldmuntz, Elizabeth, Mitchell, Laura E, Wang, Tao, and Morrow, Bernice E
- Subjects
Biological Sciences ,Biomedical and Clinical Sciences ,Genetics ,Epidemiology ,Health Sciences ,Clinical Research ,Human Genome ,Heart Disease ,Cardiovascular ,Pediatric ,Aetiology ,2.1 Biological and endogenous factors ,Case-Control Studies ,Chromosome Deletion ,Chromosomes ,Human ,Pair 22 ,Cohort Studies ,Female ,Genome-Wide Association Study ,Heart Defects ,Congenital ,Humans ,Linkage Disequilibrium ,Male ,Phenotype ,Polymorphism ,Single Nucleotide ,Proto-Oncogene Mas ,Segmental Duplications ,Genomic ,International 22q11.2 Brain and Behavior Consortium ,CRKL ,DiGeorge syndrome ,TBX1 ,chromosome 22q11.2 deletion syndrome ,complex trait ,congenital heart disease ,conotruncal heart defects ,copy number variation ,genetic association ,genetic modifier ,haploinsufficiency ,Medical and Health Sciences ,Genetics & Heredity ,Biological sciences ,Biomedical and clinical sciences ,Health sciences - Abstract
The 22q11.2 deletion syndrome (22q11.2DS) results from non-allelic homologous recombination between low-copy repeats termed LCR22. About 60%-70% of individuals with the typical 3 megabase (Mb) deletion from LCR22A-D have congenital heart disease, mostly of the conotruncal type (CTD), whereas others have normal cardiac anatomy. In this study, we tested whether variants in the hemizygous LCR22A-D region are associated with risk for CTDs on the basis of the sequence of the 22q11.2 region from 1,053 22q11.2DS individuals. We found a significant association (FDR p < 0.05) of the CTD subset with 62 common variants in a single linkage disequilibrium (LD) block in a 350 kb interval harboring CRKL. A total of 45 of the 62 variants were associated with increased risk for CTDs (odds ratio [OR) ranges: 1.64-4.75). Associations of four variants were replicated in a meta-analysis of three genome-wide association studies of CTDs in affected individuals without 22q11.2DS. One of the replicated variants, rs178252, is located in an open chromatin region and resides in the double-elite enhancer, GH22J020947, that is predicted to regulate CRKL (CRK-like proto-oncogene, cytoplasmic adaptor) expression. Approximately 23% of patients with nested LCR22C-D deletions have CTDs, and inactivation of Crkl in mice causes CTDs, thus implicating this gene as a modifier. Rs178252 and rs6004160 are expression quantitative trait loci (eQTLs) of CRKL. Furthermore, set-based tests identified an enhancer that is predicted to target CRKL and is significantly associated with CTD risk (GH22J020946, sequence kernal association test (SKAT) p = 7.21 × 10-5) in the 22q11.2DS cohort. These findings suggest that variance in CTD penetrance in the 22q11.2DS population can be explained in part by variants affecting CRKL expression.
- Published
- 2020
22. Discovery of tandem and interspersed segmental duplications using high throughput sequencing
- Author
-
Soylev, Arda, Le, Thong, Amini, Hajar, Alkan, Can, and Hormozdiari, Fereydoun
- Subjects
Genetics ,Human Genome ,Bioengineering ,Biotechnology ,Good Health and Well Being ,Algorithms ,Genome ,Human ,Genomics ,High-Throughput Nucleotide Sequencing ,Humans ,Segmental Duplications ,Genomic ,Software ,Mathematical Sciences ,Biological Sciences ,Information and Computing Sciences ,Bioinformatics - Abstract
MOTIVATION:Several algorithms have been developed that use high-throughput sequencing technology to characterize structural variations (SVs). Most of the existing approaches focus on detecting relatively simple types of SVs such as insertions, deletions and short inversions. In fact, complex SVs are of crucial importance and several have been associated with genomic disorders. To better understand the contribution of complex SVs to human disease, we need new algorithms to accurately discover and genotype such variants. Additionally, due to similar sequencing signatures, inverted duplications or gene conversion events that include inverted segmental duplications are often characterized as simple inversions, likewise, duplications and gene conversions in direct orientation may be called as simple deletions. Therefore, there is still a need for accurate algorithms to fully characterize complex SVs and thus improve calling accuracy of more simple variants. RESULTS:We developed novel algorithms to accurately characterize tandem, direct and inverted interspersed segmental duplications using short read whole genome sequencing datasets. We integrated these methods to our TARDIS tool, which is now capable of detecting various types of SVs using multiple sequence signatures such as read pair, read depth and split read. We evaluated the prediction performance of our algorithms through several experiments using both simulated and real datasets. In the simulation experiments, using a 30× coverage TARDIS achieved 96% sensitivity with only 4% false discovery rate. For experiments that involve real data, we used two haploid genomes (CHM1 and CHM13) and one human genome (NA12878) from the Illumina Platinum Genomes set. Comparison of our results with orthogonal PacBio call sets from the same genomes revealed higher accuracy for TARDIS than state-of-the-art methods. Furthermore, we showed a surprisingly low false discovery rate of our approach for discovery of tandem, direct and inverted interspersed segmental duplications prediction on CHM1 (
- Published
- 2019
23. Genome maps across 26 human populations reveal population-specific patterns of structural variation.
- Author
-
Levy-Sakin, Michal, Pastor, Steven, Mostovoy, Yulia, Li, Le, Leung, Alden KY, McCaffrey, Jennifer, Young, Eleanor, Lam, Ernest T, Hastie, Alex R, Wong, Karen HY, Chung, Claire YL, Ma, Walfred, Sibert, Justin, Rajagopalan, Ramakrishnan, Jin, Nana, Chow, Eugene YC, Chu, Catherine, Poon, Annie, Lin, Chin, Naguib, Ahmed, Wang, Wei-Ping, Cao, Han, Chan, Ting-Fung, Yip, Kevin Y, Xiao, Ming, and Kwok, Pui-Yan
- Subjects
Chromosomes ,Human ,Y ,Humans ,Chromosome Mapping ,Sequence Analysis ,DNA ,Computational Biology ,Genomics ,Phylogeny ,Base Sequence ,Gene Dosage ,Mutation ,Genome ,Human ,Algorithms ,Female ,Male ,Genomic Structural Variation ,Segmental Duplications ,Genomic ,Genetic Linkage ,Chromosomes ,Human ,Y ,Sequence Analysis ,DNA ,Genome ,Segmental Duplications ,Genomic - Abstract
Large structural variants (SVs) in the human genome are difficult to detect and study by conventional sequencing technologies. With long-range genome analysis platforms, such as optical mapping, one can identify large SVs (>2 kb) across the genome in one experiment. Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, we find that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% of the genome has high structural complexity. We are able to characterize SVs in many intractable regions of the genome, including segmental duplications and subtelomeric, pericentromeric, and acrocentric areas. In addition, we discover ~60 Mb of non-redundant genome content missing in the reference genome sequence assembly. Our results highlight the need for a comprehensive set of alternate haplotypes from different populations to represent SV patterns in the genome.
- Published
- 2019
24. Comprehensive SMN1 and SMN2 profiling for spinal muscular atrophy analysis using long-read PacBio HiFi sequencing.
- Author
-
Chen, Xiao, Harting, John, Farrow, Emily, Thiffault, Isabelle, Kasperaviciute, Dalia, Hoischen, Alexander, Gilissen, Christian, Pastinen, Tomi, and Eberle, Michael A.
- Subjects
- *
SPINAL muscular atrophy , *HAPLOGROUPS , *HAPLOTYPES , *GENETIC markers , *MEDICAL informatics , *CHROMOSOMES , *SHORT tandem repeat analysis - Abstract
Spinal muscular atrophy, a leading cause of early infant death, is caused by bi-allelic mutations of SMN1. Sequence analysis of SMN1 is challenging due to high sequence similarity with its paralog SMN2. Both genes have variable copy numbers across populations. Furthermore, without pedigree information, it is currently not possible to identify silent carriers (2+0) with two copies of SMN1 on one chromosome and zero copies on the other. We developed Paraphase, an informatics method that identifies full-length SMN1 and SMN2 haplotypes, determines the gene copy numbers, and calls phased variants using long-read PacBio HiFi data. The SMN1 and SMN2 copy-number calls by Paraphase are highly concordant with orthogonal methods (99.2% for SMN1 and 100% for SMN2). We applied Paraphase to 438 samples across 5 ethnic populations to conduct a population-wide haplotype analysis of these highly homologous genes. We identified major SMN1 and SMN2 haplogroups and characterized their co-segregation through pedigree-based analyses. We identified two SMN1 haplotypes that form a common two-copy SMN1 allele in African populations. Testing positive for these two haplotypes in an individual with two copies of SMN1 gives a silent carrier risk of 88.5%, which is significantly higher than the currently used marker (1.7%–3.0%). Extending beyond simple copy-number testing, Paraphase can detect pathogenic variants and enable potential haplotype-based screening of silent carriers through statistical phasing of haplotypes into alleles. Future analysis of larger population data will allow identification of more diverse haplotypes and genetic markers for silent carriers. We developed Paraphase, an informatics method that, combined with highly accurate long reads, can resolve the highly homologous SMN1 / SMN2 genes involved in spinal muscular atrophy. We characterized SMN1 / SMN2 haplotypes across populations and identified new genetic markers for silent carriers (2+0) with both copies of SMN1 on the same chromosome. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. Fast characterization of segmental duplication structure in multiple genome assemblies
- Author
-
Hamza Išerić, Can Alkan, Faraz Hach, and Ibrahim Numanagić
- Subjects
Genome analysis ,Fast alignment ,Segmental duplications ,Sequence decomposition ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Motivation The increasing availability of high-quality genome assemblies raised interest in the characterization of genomic architecture. Major architectural elements, such as common repeats and segmental duplications (SDs), increase genome plasticity that stimulates further evolution by changing the genomic structure and inventing new genes. Optimal computation of SDs within a genome requires quadratic-time local alignment algorithms that are impractical due to the size of most genomes. Additionally, to perform evolutionary analysis, one needs to characterize SDs in multiple genomes and find relations between those SDs and unique (non-duplicated) segments in other genomes. A naïve approach consisting of multiple sequence alignment would make the optimal solution to this problem even more impractical. Thus there is a need for fast and accurate algorithms to characterize SD structure in multiple genome assemblies to better understand the evolutionary forces that shaped the genomes of today. Results Here we introduce a new approach, BISER, to quickly detect SDs in multiple genomes and identify elementary SDs and core duplicons that drive the formation of such SDs. BISER improves earlier tools by (i) scaling the detection of SDs with low homology to multiple genomes while introducing further 7–33 $$\times$$ × speed-ups over the existing tools, and by (ii) characterizing elementary SDs and detecting core duplicons to help trace the evolutionary history of duplications to as far as 300 million years. Availability and implementation BISER is implemented in Seq programming language and is publicly available at https://github.com/0xTCG/biser .
- Published
- 2022
- Full Text
- View/download PDF
26. Human-Specific NOTCH2NL Genes Affect Notch Signaling and Cortical Neurogenesis
- Author
-
Fiddes, Ian T, Lodewijk, Gerrald A, Mooring, Meghan, Bosworth, Colleen M, Ewing, Adam D, Mantalas, Gary L, Novak, Adam M, van den Bout, Anouk, Bishara, Alex, Rosenkrantz, Jimi L, Lorig-Roach, Ryan, Field, Andrew R, Haeussler, Maximilian, Russo, Lotte, Bhaduri, Aparna, Nowakowski, Tomasz J, Pollen, Alex A, Dougherty, Max L, Nuttle, Xander, Addor, Marie-Claude, Zwolinski, Simon, Katzman, Sol, Kriegstein, Arnold, Eichler, Evan E, Salama, Sofie R, Jacobs, Frank MJ, and Haussler, David
- Subjects
Biological Sciences ,Biomedical and Clinical Sciences ,Genetics ,Stem Cell Research ,Stem Cell Research - Induced Pluripotent Stem Cell - Human ,Stem Cell Research - Induced Pluripotent Stem Cell ,Congenital Structural Anomalies ,Brain Disorders ,Stem Cell Research - Nonembryonic - Non-Human ,Stem Cell Research - Embryonic - Human ,Neurosciences ,Intellectual and Developmental Disabilities (IDD) ,Mental Health ,Pediatric ,1.1 Normal biological development and functioning ,Neurological ,Animals ,Brain ,Cell Differentiation ,Cerebral Cortex ,Embryonic Stem Cells ,Female ,Gene Deletion ,Genes ,Reporter ,Gorilla gorilla ,HEK293 Cells ,Humans ,Neocortex ,Neural Stem Cells ,Neurogenesis ,Neuroglia ,Neurons ,Pan troglodytes ,Receptor ,Notch2 ,Sequence Analysis ,RNA ,Signal Transduction ,1q21.1 ,Notch signaling ,autism ,cortical organoids ,human evolution ,neural stem cells ,neurodevelopment ,neurodevelopmental disorders ,segmental duplications ,structural variation ,Medical and Health Sciences ,Developmental Biology ,Biological sciences ,Biomedical and clinical sciences - Abstract
Genetic changes causing brain size expansion in human evolution have remained elusive. Notch signaling is essential for radial glia stem cell proliferation and is a determinant of neuronal number in the mammalian cortex. We find that three paralogs of human-specific NOTCH2NL are highly expressed in radial glia. Functional analysis reveals that different alleles of NOTCH2NL have varying potencies to enhance Notch signaling by interacting directly with NOTCH receptors. Consistent with a role in Notch signaling, NOTCH2NL ectopic expression delays differentiation of neuronal progenitors, while deletion accelerates differentiation into cortical neurons. Furthermore, NOTCH2NL genes provide the breakpoints in 1q21.1 distal deletion/duplication syndrome, where duplications are associated with macrocephaly and autism and deletions with microcephaly and schizophrenia. Thus, the emergence of human-specific NOTCH2NL genes may have contributed to the rapid evolution of the larger human neocortex, accompanied by loss of genomic stability at the 1q21.1 locus and resulting recurrent neurodevelopmental disorders.
- Published
- 2018
27. NPGREAT: assembly of human subtelomere regions with the use of ultralong nanopore reads and linked-reads.
- Author
-
Adam, Eleni, Ranjan, Desh, and Riethman, Harold
- Subjects
TELOMERES ,HUMAN DNA ,CELL physiology ,TANDEM repeats ,DNA sequencing ,HAPLOIDY ,CELL lines - Abstract
Background: Human subtelomeric DNA regulates the length and stability of adjacent telomeres that are critical for cellular function, and contains many gene/pseudogene families. Large evolutionarily recent segmental duplications and associated structural variation in human subtelomeres has made complete sequencing and assembly of these regions difficult to impossible for many loci, complicating or precluding a wide range of genetic analyses to investigate their function. Results: We present a hybrid assembly method, NanoPore Guided REgional Assembly Tool (NPGREAT), which combines Linked-Read data with mapped ultralong nanopore reads spanning subtelomeric segmental duplications to potentially overcome these difficulties. Linked-Read sets of DNA sequences identified by matches with 1-copy subtelomere sequence adjacent to segmental duplications are assembled and extended into the segmental duplication regions using Regional Extension of Assemblies using Linked-Reads (REXTAL). Mapped telomere-containing ultralong nanopore reads are then used to provide contiguity and correct orientation for matching REXTAL sequence contigs as well as identification/correction of any misassemblies. Our method was tested for a subset of representative subtelomeres with ultralong nanopore read coverage in the haploid human cell line CHM13. A 10X Linked-Read dataset from CHM13 was combined with ultralong nanopore reads from the same genome to provide improved subtelomere assemblies. Comparison of Nanopore-only assemblies using SHASTA with our NPGREAT assemblies in the distal-most subtelomere regions showed that NPGREAT produced higher-quality and more complete assemblies than SHASTA alone when these regions had low ultralong nanopore coverage (such as cases where large segmental duplications were immediately adjacent to (TTAGGG) tracts). Conclusion: In genomic regions with large segmental duplications adjacent to telomeres, NPGREAT offers an alternative economical approach to improving assembly accuracy and coverage using linked-read datasets when more expensive HiFi datasets of 10–20 kb reads are unavailable. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. Modelling segmental duplications in the human genome
- Author
-
Eldar T. Abdullaev, Iren R. Umarova, and Peter F. Arndt
- Subjects
Segmental duplications ,SDs ,Complex networks ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Segmental duplications (SDs) are long DNA sequences that are repeated in a genome and have high sequence identity. In contrast to repetitive elements they are often unique and only sometimes have multiple copies in a genome. There are several well-studied mechanisms responsible for segmental duplications: non-allelic homologous recombination, non-homologous end joining and replication slippage. Such duplications play an important role in evolution, however, we do not have a full understanding of the dynamic properties of the duplication process. Results We study segmental duplications through a graph representation where nodes represent genomic regions and edges represent duplications between them. The resulting network (the SD network) is quite complex and has distinct features which allow us to make inference on the evolution of segmantal duplications. We come up with the network growth model that explains features of the SD network thus giving us insights on dynamics of segmental duplications in the human genome. Based on our analysis of genomes of other species the network growth model seems to be applicable for multiple mammalian genomes. Conclusions Our analysis suggests that duplication rates of genomic loci grow linearly with the number of copies of a duplicated region. Several scenarios explaining such a preferential duplication rates were suggested.
- Published
- 2021
- Full Text
- View/download PDF
29. Segmental duplications drive the evolution of accessory regions in a major crop pathogen
- Author
-
van Westerhoven, Anouk C, Aguilera-Galvez, Carolina, Nakasato-Tagami, Giuliana, Shi-Kunne, Xiaoqian, Martinez de la Parte, Einar, Chavarro-Carrero, Edgar, Meijer, Harold J G, Feurtey, Alice, Maryani, Nani, Ordóñez, Nadia, Schneiders, Harrie, Nijbroek, Koen, Wittenberg, Alexander H J, Hofstede, Rene, García-Bastidas, Fernando, Sørensen, Anker, Swennen, Ronny, Drenth, Andre, Stukenbrock, Eva H, Kema, Gert H J, Seidl, Michael F, van Westerhoven, Anouk C, Aguilera-Galvez, Carolina, Nakasato-Tagami, Giuliana, Shi-Kunne, Xiaoqian, Martinez de la Parte, Einar, Chavarro-Carrero, Edgar, Meijer, Harold J G, Feurtey, Alice, Maryani, Nani, Ordóñez, Nadia, Schneiders, Harrie, Nijbroek, Koen, Wittenberg, Alexander H J, Hofstede, Rene, García-Bastidas, Fernando, Sørensen, Anker, Swennen, Ronny, Drenth, Andre, Stukenbrock, Eva H, Kema, Gert H J, and Seidl, Michael F
- Abstract
Many pathogens evolved compartmentalized genomes with conserved core and variable accessory regions (ARs) that carry effector genes mediating virulence. The fungal plant pathogen Fusarium oxysporum has such ARs, often spanning entire chromosomes. The presence of specific ARs influences the host range, and horizontal transfer of ARs can modify the pathogenicity of the receiving strain. However, how these ARs evolve in strains that infect the same host remains largely unknown. We defined the pan-genome of 69 diverse F. oxysporum strains that cause Fusarium wilt of banana, a significant constraint to global banana production, and analyzed the diversity and evolution of the ARs. Accessory regions in F. oxysporum strains infecting the same banana cultivar are highly diverse, and we could not identify any shared genomic regions and in planta-induced effectors. We demonstrate that segmental duplications drive the evolution of ARs. Furthermore, we show that recent segmental duplications specifically in accessory chromosomes cause the expansion of ARs in F. oxysporum. Taken together, we conclude that extensive recent duplications drive the evolution of ARs in F. oxysporum, which contribute to the evolution of virulence.
- Published
- 2024
30. Human adaptation and evolution by segmental duplication
- Author
-
Dennis, Megan Y and Eichler, Evan E
- Subjects
Biological Sciences ,Genetics ,Underpinning research ,1.1 Normal biological development and functioning ,Adaptation ,Physiological ,Evolution ,Molecular ,Genome ,Human ,Genotype ,Humans ,Segmental Duplications ,Genomic ,Developmental Biology ,Biochemistry and cell biology - Abstract
Duplications are the primary force by which new gene functions arise and provide a substrate for large-scale structural variation. Analysis of thousands of genomes shows that humans and great apes have more genetic differences in content and structure over recent segmental duplications than any other euchromatic region. Novel human-specific duplicated genes, ARHGAP11B and SRGAP2C, have recently been described with a potential role in neocortical expansion and increased neuronal spine density. Large segmental duplications and the structural variants they promote are also frequently stratified between human populations with a subset being subjected to positive selection. The impact of recent duplications on human evolution and adaptation is only beginning to be realized as new technologies enhance their discovery and accurate genotyping.
- Published
- 2016
31. Transposable element subfamily annotation has a reproducibility problem
- Author
-
Kaitlin M. Carey, Gilia Patterson, and Travis J. Wheeler
- Subjects
Transposable elements ,Interspersed repeats ,Subfamilies ,Segmental duplications ,Genetics ,QH426-470 - Abstract
Abstract Background Transposable element (TE) sequences are classified into families based on the reconstructed history of replication, and into subfamilies based on more fine-grained features that are often intended to capture family history. We evaluate the reliability of annotation with common subfamilies by assessing the extent to which subfamily annotation is reproducible in replicate copies created by segmental duplications in the human genome, and in homologous copies shared by human and chimpanzee. Results We find that standard methods annotate over 10% of replicates as belonging to different subfamilies, despite the fact that they are expected to be annotated as belonging to the same subfamily. Point mutations and homologous recombination appear to be responsible for some of this discordant annotation (particularly in the young Alu family), but are unlikely to fully explain the annotation unreliability. Conclusions The surprisingly high level of disagreement in subfamily annotation of homologous sequences highlights a need for further research into definition of TE subfamilies, methods for representing subfamily annotation confidence of TE instances, and approaches to better utilizing such nuanced annotation data in downstream analysis.
- Published
- 2021
- Full Text
- View/download PDF
32. Fast characterization of segmental duplication structure in multiple genome assemblies.
- Author
-
Išerić, Hamza, Alkan, Can, Hach, Faraz, and Numanagić, Ibrahim
- Subjects
ARCHITECTURAL details ,PROGRAMMING languages ,SEQUENCE alignment - Abstract
Motivation: The increasing availability of high-quality genome assemblies raised interest in the characterization of genomic architecture. Major architectural elements, such as common repeats and segmental duplications (SDs), increase genome plasticity that stimulates further evolution by changing the genomic structure and inventing new genes. Optimal computation of SDs within a genome requires quadratic-time local alignment algorithms that are impractical due to the size of most genomes. Additionally, to perform evolutionary analysis, one needs to characterize SDs in multiple genomes and find relations between those SDs and unique (non-duplicated) segments in other genomes. A naïve approach consisting of multiple sequence alignment would make the optimal solution to this problem even more impractical. Thus there is a need for fast and accurate algorithms to characterize SD structure in multiple genome assemblies to better understand the evolutionary forces that shaped the genomes of today. Results: Here we introduce a new approach, BISER, to quickly detect SDs in multiple genomes and identify elementary SDs and core duplicons that drive the formation of such SDs. BISER improves earlier tools by (i) scaling the detection of SDs with low homology to multiple genomes while introducing further 7–33 × speed-ups over the existing tools, and by (ii) characterizing elementary SDs and detecting core duplicons to help trace the evolutionary history of duplications to as far as 300 million years. Availability and implementation: BISER is implemented in Seq programming language and is publicly available at https://github.com/0xTCG/biser. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. Circular DNA intermediates in the generation of large human segmental duplications
- Author
-
Javier U. Chicote, Marcos López-Sánchez, Tomàs Marquès-Bonet, José Callizo, Luis A. Pérez-Jurado, and Antonio García-España
- Subjects
Segmental duplications ,Circular DNA ,Human genome evolution ,X-Y transposed region ,Chromoanasynthesis ,MMBIR/FoSTeS ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Duplications of large genomic segments provide genetic diversity in genome evolution. Despite their importance, how these duplications are generated remains uncertain, particularly for distant duplicated genomic segments. Results Here we provide evidence of the participation of circular DNA intermediates in the single generation of some large human segmental duplications. A specific reversion of sequence order from A-B/C-D to B-A/D-C between duplicated segments and the presence of only microhomologies and short indels at the evolutionary breakpoints suggest a circularization of the donor ancestral locus and an accidental replicative interaction with the acceptor locus. Conclusions This novel mechanism of random genomic mutation could explain several distant genomic duplications including some of the ones that took place during recent human evolution.
- Published
- 2020
- Full Text
- View/download PDF
34. Quantitative assessment reveals the dominance of duplicated sequences in germline-derived extrachromosomal circular DNA.
- Author
-
Mouakkad-Montoya, Lila, Murata, Michael M., Sulovari, Arvis, Suzuki, Ryusuke, Osia, Beth, Malkova, Anna, Makoto Katsumata, Giuliano, Armando E., Eichler, Evan E., and Hisashi Tanaka
- Subjects
- *
EXTRACHROMOSOMAL DNA , *CIRCULAR DNA , *NUCLEAR DNA , *HUMAN DNA , *HUMAN genome - Abstract
Extrachromosomal circular DNA (eccDNA) originates from linear chromosomal DNA in various human tissues under physiological and disease conditions. The genomic origins of eccDNA have largely been investigated using in vitro-amplified DNA. However, in vitro amplification obscures quantitative information by skewing the total population stoichiometry. In addition, the analyses have focused on eccDNA stemming from single-copy genomic regions, leaving eccDNA from multicopy regions unexamined. To address these issues, we isolated eccDNA without in vitro amplification (naïve small circular DNA, nscDNA) and assessed the populations quantitatively by integrated genomic, molecular, and cytogenetic approaches. nscDNA of up to tens of kilobases were successfully enriched by our approach and were predominantly derived from multicopy genomic regions including segmental duplications (SDs). SDs, which account for 5% of the human genome and are hotspots for copy number variations, were significantly overrepresented in sperm nscDNA, with three times more sequencing reads derived from SDs than from the entire singlecopy regions. SDs were also overrepresented in mouse sperm nscDNA, which we estimated to comprise 0.2% of nuclear DNA. Considering that eccDNA can be integrated into chromosomes, germline-derived nscDNA may be a mediator of genome diversity. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
35. Characterization of primate structural variation using diverse sequencing technologies
- Author
-
Soto, Daniela Catalina
- Subjects
Genetics ,Bioinformatics ,Biology ,Long-read sequencing ,Segmental duplications ,Structural variation - Abstract
Elucidating the genetic changes underlying the evolution of human traits remains an unfinished puzzle. Structural variants (SVs) account for more genetic differences than single-nucleotide polymorphisms between humans and our closest living relatives, chimpanzees, and are a hallmark of great ape evolution. The genomes of great apes are enriched in large interspersed segmental duplications (SDs), defined as duplications larger than 1 kbp with over 90% sequence identity, that sensitize the genome to further genomic rearrangements, including copy-number variation, via non- allelic homologous recombination. Despite their relevance, the identification and characterization of these SVs has been hindered by short reads lengths as they lack enough sequence context to discover breakpoints and cannot unequivocally be mapped to highly identical duplicates. Long-read sequencing technologies overcome these limitations by providing reads thousands of bases long, but the availability of population cohorts remains limited.This thesis studies primate SVs and SDs characterized using diverse sequencing technologies and assesses their representation in reference genomes, variation across modern populations, their putative molecular impacts, and their roles in evolution and adaptation. We found novel SVs, including 88 deletions and 36 inversions, in two chimpanzee individuals sequenced with nanopore and optical mapping. Deletions and inversion breakpoints were depleted within topologically associated domains but enriched in differentially expressed genes between the two species. Focusing on human SDs, we identified eight Mbp of erroneously collapsed duplications in the human reference genome, impacting 48 protein coding and ten medically relevant genes, that are corrected in the first complete sequence of a human genome, T2T-CHM13. Leveraging this new reference, we identified 417 genes embedded in SDs with over 98% sequence identity (SD-98) that are near copy-number (CN) fixed in modern humans (1000 Genomes Project; 1KGP), 205 genes showing stratification between diverse modern populations (VST>95th percentile), and 22 protein-encoding genes showing consistent Tajima’s D outlier values across all humans examined. Our approach highlighted potential relevant human gene duplications, which are priority candidates for functional studies. Finally, we provide a compendium of tools and practices that we recommend be adopted by computational biologists to increase reproducibility in the field.
- Published
- 2022
36. Segmental duplication as potential biomarkers for non-invasive prenatal testing of aneuploidies
- Author
-
Xinwen Chen, Yifan Li, Qiuying Huang, Xingming Lin, Xudong Wang, Yafang Wang, Ying Liu, Qiushun He, Yinghua Liu, Ting Wang, Zhi-Liang Ji, and Qingge Li
- Subjects
Segmental duplications ,Computational program ,Chromosome aneuploidy ,Digital PCR ,Prenatal diagnosis ,Medicine ,Medicine (General) ,R5-920 - Abstract
Background: Segmental duplication (SD) regions are distinct targets for aneuploidy detection owing to the virtual elimination of amplification bias. The difficulty of searching SD sequences for assay design has hampered their applications. Methods: We developed a computational program, ChAPDes, which integrates SD searching, refinement, and design of specific PCR primer/probe sets in a pipeline to remove most of the manual work. The generated primer/probe sets were first tested in a multiplex multicolour melting curve analysis for the detection of five common aneuploidies. The primer/probe sets were then tested in a digital PCR assay for the detection of trisomy 21. Finally, a digital PCR protocol was established to quantify maternal plasma DNA sequences for the non-invasive prenatal detection of fetal trisomy 21. Findings: ChAPDes could output 21,772 candidate primer/probe sets for trisomy 13, 18, 21 and sex chromosome aneuploidies within 2 working days. Clinical evaluation of the multiplex multicolour melting curve analysis involving 463 fetal genomic DNA samples revealed a sensitivity of 100% and specificity of 99.64% in comparison with the reference methods. Using the established digital PCR protocol, we correctly identified two trisomy 21 fetuses and thirteen euploid foetuses from the maternal plasma samples. Interpretation: The combination of ChAPDes with digital PCR detection could facilitate the use of SD as potential biomarkers for the non-invasive prenatal testing of fetal chromosomal aneuploidies.
- Published
- 2021
- Full Text
- View/download PDF
37. Gapless indica rice genome reveals synergistic contributions of active transposable elements and segmental duplications to rice genome evolution.
- Author
-
Li, Kui, Jiang, Wenkai, Hui, Yuanyuan, Kong, Mengjuan, Feng, Li-Ying, Gao, Li-Zhi, Li, Pengfu, and Lu, Shan
- Abstract
The ultimate goal of genome assembly is a high-accuracy gapless genome. Here, we report a new assembly pipeline that is used to produce a gapless genome for the indica rice cultivar Minghui 63. The resulting 397.71-Mb final assembly is composed of 12 contigs with a contig N50 size of 31.93 Mb. Each chromosome is represented by a single contig and the genomic sequences of all chromosomes are gapless. Quality evaluation of this gapless genome assembly showed that gene regions in our assembly have the highest completeness compared with the other 15 reported high-quality rice genomes. Further comparison with the japonica rice genome revealed that the gapless indica genome assembly contains more transposable elements (TEs) and segmental duplications (SDs), the latter of which produce many duplicated genes that can affect agronomic traits through dose effect or sub-/neo-functionalization. The insertion of TEs can also affect the expression of duplicated genes, which may drive the evolution of these genes. Furthermore, we found the expansion of nucleotide-binding site with leucine-rich repeat disease-resistance genes and cis -zeatin- O -glucosyltransferase growth-related genes in SDs in the gapless indica genome assembly, suggesting that SDs contribute to the adaptive evolution of rice disease resistance and developmental processes. Collectively, our findings suggest that active TEs and SDs synergistically contribute to rice genome evolution. In this study, we present a new assembly pipeline to produce a gapless 397.71-Mb genome for the indica rice cultivar Minghui 63, which is composed of 12 contigs, with a contig N50 size of 31.93 Mb, and each chromosome is represented by a single contig. Compared with japonica rice, the indica genome has more transposable elements (TEs) and segmental duplications (SDs), and our findings suggest that active TEs and SDs together provide synergistic contributions to rice genome evolution. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
38. Palindromic GOLGA8 core duplicons promote chromosome 15q13.3 microdeletion and evolutionary instability.
- Author
-
Antonacci, Francesca, Dennis, Megan Y, Huddleston, John, Sudmant, Peter H, Steinberg, Karyn Meltz, Rosenfeld, Jill A, Miroballo, Mattia, Graves, Tina A, Vives, Laura, Malig, Maika, Denman, Laura, Raja, Archana, Stuart, Andrew, Tang, Joyce, Munson, Brenton, Shaffer, Lisa G, Amemiya, Chris T, Wilson, Richard K, and Eichler, Evan E
- Subjects
Chromosomes ,Artificial ,Bacterial ,Chromosomes ,Human ,Pair 15 ,Animals ,Primates ,Humans ,Seizures ,Chromosome Disorders ,Chromosome Deletion ,In Situ Hybridization ,Fluorescence ,Cluster Analysis ,Sequence Analysis ,DNA ,Repetitive Sequences ,Nucleic Acid ,Gene Dosage ,Polymorphism ,Genetic ,Genome ,Human ,Models ,Genetic ,Comparative Genomic Hybridization ,Segmental Duplications ,Genomic ,Biological Evolution ,Intellectual Disability ,Genetics ,Biological Sciences ,Medical and Health Sciences ,Developmental Biology - Abstract
Recurrent deletions of chromosome 15q13.3 associate with intellectual disability, schizophrenia, autism and epilepsy. To gain insight into the instability of this region, we sequenced it in affected individuals, normal individuals and nonhuman primates. We discovered five structural configurations of the human chromosome 15q13.3 region ranging in size from 2 to 3 Mb. These configurations arose recently (∼0.5-0.9 million years ago) as a result of human-specific expansions of segmental duplications and two independent inversion events. All inversion breakpoints map near GOLGA8 core duplicons-a ∼14-kb primate-specific chromosome 15 repeat that became organized into larger palindromic structures. GOLGA8-flanked palindromes also demarcate the breakpoints of recurrent 15q13.3 microdeletions, the expansion of chromosome 15 segmental duplications in the human lineage and independent structural changes in apes. The significant clustering (P = 0.002) of breakpoints provides mechanistic evidence for the role of this core duplicon and its palindromic architecture in promoting the evolutionary and disease-related instability of chromosome 15.
- Published
- 2014
39. Finished sequence and assembly of the DUF1220-rich 1q21 region using a haploid human genome
- Author
-
O’Bleness, Majesta, Searles, Veronica B, Dickens, C Michael, Astling, David, Albracht, Derek, Mak, Angel CY, Lai, Yvonne YY, Lin, Chin, Chu, Catherine, Graves, Tina, Kwok, Pui-Yan, Wilson, Richard K, and Sikela, James M
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Human Genome ,Biotechnology ,1.1 Normal biological development and functioning ,Generic health relevance ,Biological Evolution ,Carrier Proteins ,Chromosomes ,Human ,Pair 1 ,Comparative Genomic Hybridization ,DNA Copy Number Variations ,Genetic Linkage ,Genome ,Human ,Haploidy ,Humans ,Protein Structure ,Tertiary ,Segmental Duplications ,Genomic ,1q21 ,DUF1220 domain ,Hydatidiform mole ,Information and Computing Sciences ,Medical and Health Sciences ,Bioinformatics ,Biological sciences ,Biomedical and clinical sciences - Abstract
BackgroundAlthough the reference human genome sequence was declared finished in 2003, some regions of the genome remain incomplete due to their complex architecture. One such region, 1q21.1-q21.2, is of increasing interest due to its relevance to human disease and evolution. Elucidation of the exact variants behind these associations has been hampered by the repetitive nature of the region and its incomplete assembly. This region also contains 238 of the 270 human DUF1220 protein domains, which are implicated in human brain evolution and neurodevelopment. Additionally, examinations of this protein domain have been challenging due to the incomplete 1q21 build. To address these problems, a single-haplotype hydatidiform mole BAC library (CHORI-17) was used to produce the first complete sequence of the 1q21.1-q21.2 region.ResultsWe found and addressed several inaccuracies in the GRCh37sequence of the 1q21 region on large and small scales, including genomic rearrangements and inversions, and incorrect gene copy number estimates and assemblies. The DUF1220-encoding NBPF genes required the most corrections, with 3 genes removed, 2 genes reassigned to the 1p11.2 region, 8 genes requiring assembly corrections for DUF1220 domains (~91 DUF1220 domains were misassigned), and multiple instances of nucleotide changes that reassigned the domain to a different DUF1220 subtype. These corrections resulted in an overall increase in DUF1220 copy number, yielding a haploid total of 289 copies. Approximately 20 of these new DUF1220 copies were the result of a segmental duplication from 1q21.2 to 1p11.2 that included two NBPF genes. Interestingly, this duplication may have been the catalyst for the evolutionarily important human lineage-specific chromosome 1 pericentric inversion.ConclusionsThrough the hydatidiform mole genome sequencing effort, the 1q21.1-q21.2 region is complete and misassemblies involving inter- and intra-region duplications have been resolved. The availability of this single haploid sequence path will aid in the investigation of many genetic diseases linked to 1q21, including several associated with DUF1220 copy number variations. Finally, the corrected sequence identified a recent segmental duplication that added 20 additional DUF1220 copies to the human genome, and may have facilitated the chromosome 1 pericentric inversion that is among the most notable human-specific genomic landmarks.
- Published
- 2014
40. Finished sequence and assembly of the DUF1220-rich 1q21 region using a haploid human genome.
- Author
-
O'Bleness, Majesta, Searles, Veronica B, Dickens, C Michael, Astling, David, Albracht, Derek, Mak, Angel CY, Lai, Yvonne YY, Lin, Chin, Chu, Catherine, Graves, Tina, Kwok, Pui-Yan, Wilson, Richard K, and Sikela, James M
- Subjects
Chromosomes ,Human ,Pair 1 ,Humans ,Carrier Proteins ,Protein Structure ,Tertiary ,Haploidy ,Genome ,Human ,Comparative Genomic Hybridization ,DNA Copy Number Variations ,Segmental Duplications ,Genomic ,Biological Evolution ,Genetic Linkage ,1q21 ,DUF1220 domain ,Hydatidiform mole ,Chromosomes ,Human ,Pair 1 ,Genome ,Protein Structure ,Tertiary ,Segmental Duplications ,Genomic ,Bioinformatics ,Biological Sciences ,Information and Computing Sciences ,Medical and Health Sciences - Abstract
BackgroundAlthough the reference human genome sequence was declared finished in 2003, some regions of the genome remain incomplete due to their complex architecture. One such region, 1q21.1-q21.2, is of increasing interest due to its relevance to human disease and evolution. Elucidation of the exact variants behind these associations has been hampered by the repetitive nature of the region and its incomplete assembly. This region also contains 238 of the 270 human DUF1220 protein domains, which are implicated in human brain evolution and neurodevelopment. Additionally, examinations of this protein domain have been challenging due to the incomplete 1q21 build. To address these problems, a single-haplotype hydatidiform mole BAC library (CHORI-17) was used to produce the first complete sequence of the 1q21.1-q21.2 region.ResultsWe found and addressed several inaccuracies in the GRCh37sequence of the 1q21 region on large and small scales, including genomic rearrangements and inversions, and incorrect gene copy number estimates and assemblies. The DUF1220-encoding NBPF genes required the most corrections, with 3 genes removed, 2 genes reassigned to the 1p11.2 region, 8 genes requiring assembly corrections for DUF1220 domains (~91 DUF1220 domains were misassigned), and multiple instances of nucleotide changes that reassigned the domain to a different DUF1220 subtype. These corrections resulted in an overall increase in DUF1220 copy number, yielding a haploid total of 289 copies. Approximately 20 of these new DUF1220 copies were the result of a segmental duplication from 1q21.2 to 1p11.2 that included two NBPF genes. Interestingly, this duplication may have been the catalyst for the evolutionarily important human lineage-specific chromosome 1 pericentric inversion.ConclusionsThrough the hydatidiform mole genome sequencing effort, the 1q21.1-q21.2 region is complete and misassemblies involving inter- and intra-region duplications have been resolved. The availability of this single haploid sequence path will aid in the investigation of many genetic diseases linked to 1q21, including several associated with DUF1220 copy number variations. Finally, the corrected sequence identified a recent segmental duplication that added 20 additional DUF1220 copies to the human genome, and may have facilitated the chromosome 1 pericentric inversion that is among the most notable human-specific genomic landmarks.
- Published
- 2014
41. Evolutionary history of the human multigene families reveals widespread gene duplications throughout the history of animals
- Author
-
Nashaiman Pervaiz, Nazia Shakeel, Ayesha Qasim, Rabail Zehra, Saneela Anwar, Neenish Rana, Yongbiao Xue, Zhang Zhang, Yiming Bao, and Amir Ali Abbasi
- Subjects
Human ,Whole genome duplications ,Segmental duplications ,Paralogons ,Paralogy regions ,Vertebrate ,Evolution ,QH359-425 - Abstract
Abstract Background The hypothesis that vertebrates have experienced two ancient, whole genome duplications (WGDs) is of central interest to evolutionary biology and has been implicated in evolution of developmental complexity. Three-way and Four-way paralogy regions in human and other vertebrate genomes are considered as vital evidence to support this hypothesis. Alternatively, it has been proposed that such paralogy regions are created by small-scale duplications that occurred at different intervals over the evolution of life. Results To address this debate, the present study investigates the evolutionary history of multigene families with at least three-fold representation on human chromosomes 1, 2, 8 and 20. Phylogenetic analysis and the tree topology comparisons classified the members of 36 multigene families into four distinct co-duplicated groups. Gene families falling within the same co-duplicated group might have duplicated together, whereas genes belong to different co-duplicated groups might have distinct evolutionary origins. Conclusion Taken together with previous investigations, the current study yielded no proof in favor of WGDs hypothesis. Rather, it appears that the vertebrate genome evolved as a result of small-scale duplication events, that cover the entire span of the animals’ history.
- Published
- 2019
- Full Text
- View/download PDF
42. Segmental Duplications as a Complement Strategy to Short Tandem Repeats in the Prenatal Diagnosis of Down Syndrome
- Author
-
Mohammad Reza Miri, Jamileh Saberzadeh, Abbas Behzad Behbahani, Mohammad Bagher Tabei, Mohsen Alipour, and Majid Fardaei
- Subjects
Multiplex polymerase chain reaction ,Microsatellite repeats ,Down syndrome ,Segmental duplications ,Medicine (General) ,R5-920 - Abstract
Background: Quantitative fluorescence-polymerase chain reaction (QF-PCR) is an inexpensive and accurate method for the prenatal diagnosis of aneuploidies that applies short tandem repeats (STRs) as a chromosome-specific marker. Despite its apparent advantages, QF-PCR is not applicable in all cases due to the presence of uninformative STRs. This study was carried out to investigate the efficiency of a method based on applying segmental duplications (SDs) in conjunction with STRs as an alternative to stand-alone STR-based QF-PCR for the diagnosis of Down syndrome. Methods: Fifty amniotic fluid samples from pregnant women carrying Down syndrome fetuses, 9 amniotic fluid samples with 1 or without any informative STR marker (inconclusive), and 100 normal samples were selected from Shiraz, Iran, between October 2015 and December 2016. Analysis was done using an in-house STR-SD-based multiplex QF-PCR and the results were compared. Statistical analysis was performed using MedCalc, version 14.Results: All the normal, Down syndrome, and inconclusive samples were accurately identified by the STR-SD-based multiplex QF-PCR, yielding 100% sensitivity and 100% specificity. Karyotype analysis confirmed all the cases with normal or trisomic results.Conclusion: The STR-SD-based multiplex QF-PCR correctly identified all the normal and trisomy 21 samples regardless of the absence of informative STR markers. The STR-SD-based multiplex QF-PCR is a feasible and particularly useful assay in populations with a high prevalence of homozygote STR markers.
- Published
- 2019
- Full Text
- View/download PDF
43. Reconciling multiple genes trees via segmental duplications and losses
- Author
-
Riccardo Dondi, Manuel Lafond, and Celine Scornavacca
- Subjects
Phylogenetics ,Gene trees ,Species trees ,Reconciliation ,Segmental duplications ,Fixed-parameter tractability ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary events, where segmental duplication events and losses are associated with cost $$\delta $$ δ and $$\lambda $$ λ , respectively. We show that the problem is polynomial-time solvable when $$\delta \le \lambda $$ δ≤λ (via LCA-mapping), while if $$\delta > \lambda $$ δ>λ the problem is NP-hard, even when $$\lambda = 0$$ λ=0 and a single gene tree is given, solving a long standing open problem on the complexity of multi-gene reconciliation. On the positive side, we give a fixed-parameter algorithm for the problem, where the parameters are $$\delta /\lambda $$ δ/λ and the number d of segmental duplications, of time complexity $$O\left(\lceil \frac{\delta }{\lambda } \rceil ^{d} \cdot n \cdot \frac{\delta }{\lambda }\right)$$ O⌈δλ⌉d·n·δλ . Finally, we demonstrate the usefulness of this algorithm on two previously studied real datasets: we first show that our method can be used to confirm or raise doubt on hypothetical segmental duplications on a set of 16 eukaryotes, then show how we can detect whole genome duplications in yeast genomes.
- Published
- 2019
- Full Text
- View/download PDF
44. Modelling segmental duplications in the human genome.
- Author
-
Abdullaev, Eldar T., Umarova, Iren R., and Arndt, Peter F.
- Subjects
HUMAN genome ,DNA sequencing ,REPRESENTATIONS of graphs ,GENOMES ,DNA copy number variations ,GENE regulatory networks ,SPECIES ,LOCUS (Genetics) - Abstract
Background: Segmental duplications (SDs) are long DNA sequences that are repeated in a genome and have high sequence identity. In contrast to repetitive elements they are often unique and only sometimes have multiple copies in a genome. There are several well-studied mechanisms responsible for segmental duplications: non-allelic homologous recombination, non-homologous end joining and replication slippage. Such duplications play an important role in evolution, however, we do not have a full understanding of the dynamic properties of the duplication process. Results: We study segmental duplications through a graph representation where nodes represent genomic regions and edges represent duplications between them. The resulting network (the SD network) is quite complex and has distinct features which allow us to make inference on the evolution of segmantal duplications. We come up with the network growth model that explains features of the SD network thus giving us insights on dynamics of segmental duplications in the human genome. Based on our analysis of genomes of other species the network growth model seems to be applicable for multiple mammalian genomes. Conclusions: Our analysis suggests that duplication rates of genomic loci grow linearly with the number of copies of a duplicated region. Several scenarios explaining such a preferential duplication rates were suggested. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
45. Global increases in both common and rare copy number load associated with autism
- Author
-
Girirajan, Santhosh, Johnson, Rebecca L, Tassone, Flora, Balciuniene, Jorune, Katiyar, Neerja, Fox, Keolu, Baker, Carl, Srikanth, Abhinaya, Yeoh, Kian Hui, Khoo, Su Jen, Nauth, Therese B, Hansen, Robin, Ritchie, Marylyn, Hertz-Picciotto, Irva, Eichler, Evan E, Pessah, Isaac N, and Selleck, Scott B
- Subjects
Biological Sciences ,Genetics ,Intellectual and Developmental Disabilities (IDD) ,Mental Health ,Clinical Research ,Pediatric ,Biotechnology ,Brain Disorders ,Human Genome ,Autism ,Behavioral and Social Science ,2.1 Biological and endogenous factors ,Mental health ,Autistic Disorder ,Case-Control Studies ,Child ,Child ,Preschool ,DNA Copy Number Variations ,Female ,Humans ,Male ,Oligonucleotide Array Sequence Analysis ,Segmental Duplications ,Genomic ,Sequence Deletion ,Medical and Health Sciences ,Genetics & Heredity - Abstract
Children with autism have an elevated frequency of large, rare copy number variants (CNVs). However, the global load of deletions or duplications, per se, and their size, location and relationship to clinical manifestations of autism have not been documented. We examined CNV data from 516 individuals with autism or typical development from the population-based Childhood Autism Risks from Genetics and Environment (CHARGE) study. We interrogated 120 regions flanked by segmental duplications (genomic hotspots) for events >50 kbp and the entire genomic backbone for variants >300 kbp using a custom targeted DNA microarray. This analysis was complemented by a separate study of five highly dynamic hotspots associated with autism or developmental delay syndromes, using a finely tiled array platform (>1 kbp) in 142 children matched for gender and ethnicity. In both studies, a significant increase in the number of base pairs of duplication, but not deletion, was associated with autism. Significantly elevated levels of CNV load remained after the removal of rare and likely pathogenic events. Further, the entire CNV load detected with the finely tiled array was contributed by common variants. The impact of this variation was assessed by examining the correlation of clinical outcomes with CNV load. The level of personal and social skills, measured by Vineland Adaptive Behavior Scales, negatively correlated (Spearman's r = -0.13, P = 0.034) with the duplication CNV load for the affected children; the strongest association was found for communication (P = 0.048) and socialization (P = 0.022) scores. We propose that CNV load, predominantly increased genomic base pairs of duplication, predisposes to autism.
- Published
- 2013
46. Genomic regions associated with microdeletion/ microduplication syndromes exhibit extreme diversity of structural variation.
- Author
-
Mostovoy, Yulia, Yilmaz, Feyza, Chow, Stephen K., Chu, Catherine, Lin, Chin, Geiger, Elizabeth A., Meeks, Naomi J. L., Chatfield, Kathryn C., Coughlin II, Curtis R., Surti, Urvashi, Kwok, Pui-Yan, and Shaikh, Tamim H.
- Subjects
- *
HIGH throughput screening (Drug development) , *GENOMES , *CHROMOSOME abnormalities , *GENOTYPES , *DESCRIPTIVE statistics , *DISEASE prevalence , *GENE mapping , *PHENOTYPES - Abstract
Segmental duplications (SDs) are a class of long, repetitive DNA elements whose paralogs share a high level of sequence similarity with each other. SDs mediate chromosomal rearrangements that lead to structural variation in the general population as well as genomic disorders associated with multiple congenital anomalies, including the 7q11.23 (Williams--Beuren Syndrome, WBS), 15q13.3, and 16p12.2 microdeletion syndromes. Population-level characterization of SDs has generally been lacking because most techniques used for analyzing these complex regions are both labor and cost intensive. In this study, we have used a high-throughput technique to genotype complex structural variation with a single molecule, long-range optical mapping approach. We characterized SDs and identified novel structural variants (SVs) at 7q11.23, 15q13.3, and 16p12.2 using optical mapping data from 154 phenotypically normal individuals from 26 populations comprising five super-populations. We detected several novel SVs for each locus, some of which had significantly different prevalence between populations. Additionally, we localized the microdeletion breakpoints to specific paralogous duplicons located within complex SDs in two patients with WBS, one patient with 15q13.3, and one patient with 16p12.2 microdeletion syndromes. The population-level data presented here highlights the extreme diversity of large and complex SVs within SD-containing regions. The approach we outline will greatly facilitate the investigation of the role of inter-SD structural variation as a driver of chromosomal rearrangements and genomic disorders [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
47. Genome-Wide Analysis of Copy Number Variants in Attention Deficit Hyperactivity Disorder: The Role of Rare Variants and Duplications at 15q13.3
- Author
-
Williams, Nigel M, Franke, Barbara, Mick, Eric, Anney, Richard JL, Freitag, Christine M, Gill, Michael, Thapar, Anita, O'Donovan, Michael C, Owen, Michael J, Holmans, Peter, Kent, Lindsey, Middleton, Frank, Zhang-James, Yanli, Liu, Lu, Meyer, Jobst, Nguyen, Thuy Trang, Romanos, Jasmin, Romanos, Marcel, Seitz, Christiane, Renner, Tobias J, Walitza, Susanne, Warnke, Andreas, Palmason, Haukur, Buitelaar, Jan, Rommelse, Nanda, Vasquez, Alejandro Arias, Hawi, Ziarih, Langley, Kate, Sergeant, Joseph, Steinhausen, Hans-Christoph, Roeyers, Herbert, Biederman, Joseph, Zaharieva, Irina, Hakonarson, Hakon, Elia, Josephine, Lionel, Anath C, Crosbie, Jennifer, Marshall, Christian R, Schachar, Russell, Scherer, Stephen W, Todorov, Alexandre, Smalley, Susan L, Loo, Sandra, Nelson, Stanley, Shtir, Corina, Asherson, Philip, Reif, Andreas, Lesch, Klaus-Peter, and Faraone, Stephen V
- Subjects
Human Genome ,Clinical Research ,Mental Health ,Genetics ,Pediatric ,Attention Deficit Hyperactivity Disorder (ADHD) ,Brain Disorders ,Serious Mental Illness ,2.1 Biological and endogenous factors ,Aetiology ,Adolescent ,Attention Deficit Disorder with Hyperactivity ,Canada ,Causality ,Child ,Child ,Preschool ,Female ,Gene Dosage ,Genetic Predisposition to Disease ,Genome-Wide Association Study ,Humans ,In Situ Hybridization ,Fluorescence ,Inheritance Patterns ,Polymorphism ,Single Nucleotide ,Receptors ,Nicotinic ,Segmental Duplications ,Genomic ,United Kingdom ,United States ,alpha7 Nicotinic Acetylcholine Receptor ,Medical and Health Sciences ,Psychology and Cognitive Sciences ,Psychiatry - Abstract
ObjectiveAttention deficit hyperactivity disorder (ADHD) is a common, highly heritable psychiatric disorder. Because of its multifactorial etiology, however, identifying the genes involved has been difficult. The authors followed up on recent findings suggesting that rare copy number variants (CNVs) may be important for ADHD etiology.MethodThe authors performed a genome-wide analysis of large, rare CNVs (100 kb in size, which segregated into 912 independent loci. Overall, the rate of rare CNVs >100 kb was 1.15 times higher in ADHD case subjects relative to comparison subjects, with duplications spanning known genes showing a 1.2-fold enrichment. In accordance with a previous study, rare CNVs >500 kb showed the greatest enrichment (1.28-fold). CNVs identified in ADHD case subjects were significantly enriched for loci implicated in autism and in schizophrenia. Duplications spanning the CHRNA7 gene at chromosome 15q13.3 were associated with ADHD in single-locus analysis. This finding was consistently replicated in an additional 2,242 ADHD case subjects and 8,552 comparison subjects from four independent cohorts from the United Kingdom, the United States, and Canada. Presence of the duplication at 15q13.3 appeared to be associated with comorbid conduct disorder.ConclusionsThese findings support the enrichment of large, rare CNVs in ADHD and implicate duplications at 15q13.3 as a novel risk factor for ADHD. With a frequency of 0.6% in the populations investigated and a relatively large effect size (odds ratio=2.22, 95% confidence interval=1.5–3.6), this locus could be an important contributor to ADHD etiology.
- Published
- 2012
48. A segmental genomic duplication generates a functional intron.
- Author
-
Hellsten, Uffe, Aspden, Julie, RIO, Donald C., and Rokhsar, Daniel
- Subjects
Animals ,Base Sequence ,Cell Line ,Humans ,Introns ,Molecular Sequence Data ,RNA Splice Sites ,RNA Splicing ,Sarcoplasmic Reticulum Calcium-Transporting ATPases ,Segmental Duplications ,Genomic ,Vertebrates - Abstract
An intron is an extended genomic feature whose function requires multiple constrained positions-donor and acceptor splice sites, a branch point, a polypyrimidine tract and suitable splicing enhancers-that may be distributed over hundreds or thousands of nucleotides. New introns are therefore unlikely to emerge by incremental accumulation of functional sub-elements. Here we demonstrate that a functional intron can be created de novo in a single step by a segmental genomic duplication. This experiment recapitulates in vivo the birth of an intron that arose in the ancestral jawed vertebrate lineage nearly half-a-billion years ago.
- Published
- 2011
49. Complex evolution of the GSTM gene family involves sharing of GSTM1 deletion polymorphism in humans and chimpanzees
- Author
-
M. Saitou, Y. Satta, O. Gokcumen, and T. Ishida
- Subjects
Copy number variation ,Structural variants ,Detoxifying gene family ,Primates ,Gene conversions ,Segmental duplications ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background The common deletion of the glutathione S-transferase Mu 1 (GSTM1) gene in humans has been shown to be involved in xenobiotic metabolism and associated with bladder cancer. However, the evolution of this deletion has not been investigated. Results In this study, we conducted comparative analyses of primate genomes. We demonstrated that the GSTM gene family has evolved through multiple structural variations, involving gene duplications, losses, large inversions and gene conversions. We further showed experimentally that the GSTM1 was polymorphically deleted in both humans and also in chimpanzees, through independent deletion events. To generalize our results, we searched for genic deletions that are polymorphic in both humans and chimpanzees. Consequently, we found only two such deletions among the thousands that we have searched, one of them being the GSTM1 deletion and the other surprisingly being another metabolizing gene, the UGT2B17. Conclusions Overall, our results support the emerging notion that metabolizing gene families, such as the GSTM, NAT, UGT and CYP, have been evolving rapidly through gene duplication and deletion events in primates, leading to complex structural variation within and among species with unknown evolutionary consequences.
- Published
- 2018
- Full Text
- View/download PDF
50. Transposable element subfamily annotation has a reproducibility problem.
- Author
-
Carey, Kaitlin M., Patterson, Gilia, and Wheeler, Travis J.
- Subjects
ANNOTATIONS ,HUMAN genome ,CHIMPANZEES ,DATA analysis - Abstract
Background: Transposable element (TE) sequences are classified into families based on the reconstructed history of replication, and into subfamilies based on more fine-grained features that are often intended to capture family history. We evaluate the reliability of annotation with common subfamilies by assessing the extent to which subfamily annotation is reproducible in replicate copies created by segmental duplications in the human genome, and in homologous copies shared by human and chimpanzee. Results: We find that standard methods annotate over 10% of replicates as belonging to different subfamilies, despite the fact that they are expected to be annotated as belonging to the same subfamily. Point mutations and homologous recombination appear to be responsible for some of this discordant annotation (particularly in the young Alu family), but are unlikely to fully explain the annotation unreliability. Conclusions: The surprisingly high level of disagreement in subfamily annotation of homologous sequences highlights a need for further research into definition of TE subfamilies, methods for representing subfamily annotation confidence of TE instances, and approaches to better utilizing such nuanced annotation data in downstream analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.