101 results on '"TANDEM REPEATS"'
Search Results
2. Identification and subtyping of Cryptosporidium spp. using Nanopore sequencing
- Author
-
Svensson Henningsson, Isabelle and Svensson Henningsson, Isabelle
- Abstract
Cryptosporidium is a parasite that causes gastrointestinal issues such as diarrhoea and stomach pain. The main transmission routes are through contaminated water or food, between humans and from animal to humans. Cryptosporidium infects through oocysts which contain four sporozoites that releases when entering a host and can continue to breed inside the body. Cryptosporidium can cause massive outbreaks if established in a water source used for drinking water. To prevent and detect an outbreak it´s important to trace the transmission back to the source. The GP60-gene is used to identify and subtype Cryptosporidium and is a very useful tool during contact tracing. The purpose of this study was to identify species and subtype of Cryptosporidium using nanopore sequencing. In this study the GP60-gene was amplified using a Nested PCR protocol and then sequenced using nanopore sequencing. The sequences acquired where then used to make a search in BLAST to identify the species. The GP60 subtyping method uses the hypervariable region on the GP60-gene. A series of tandem repeats are used to identify the subtype. In this study seven positive Cryptosporidium faeces samples were amplified and sequenced. Nanopore sequencing was possible for five of the seven samples with C. parvum identified in four of these samples. Targeting the GP60-gene to determine species and subtype works well for the most common human pathogen species of Cryptosporidium. Further optimization is required before the method can be implemented för diagnostic use.
- Published
- 2024
3. Identification and subtyping of Cryptosporidium spp. using Nanopore sequencing
- Author
-
Svensson Henningsson, Isabelle and Svensson Henningsson, Isabelle
- Abstract
Cryptosporidium is a parasite that causes gastrointestinal issues such as diarrhoea and stomach pain. The main transmission routes are through contaminated water or food, between humans and from animal to humans. Cryptosporidium infects through oocysts which contain four sporozoites that releases when entering a host and can continue to breed inside the body. Cryptosporidium can cause massive outbreaks if established in a water source used for drinking water. To prevent and detect an outbreak it´s important to trace the transmission back to the source. The GP60-gene is used to identify and subtype Cryptosporidium and is a very useful tool during contact tracing. The purpose of this study was to identify species and subtype of Cryptosporidium using nanopore sequencing. In this study the GP60-gene was amplified using a Nested PCR protocol and then sequenced using nanopore sequencing. The sequences acquired where then used to make a search in BLAST to identify the species. The GP60 subtyping method uses the hypervariable region on the GP60-gene. A series of tandem repeats are used to identify the subtype. In this study seven positive Cryptosporidium faeces samples were amplified and sequenced. Nanopore sequencing was possible for five of the seven samples with C. parvum identified in four of these samples. Targeting the GP60-gene to determine species and subtype works well for the most common human pathogen species of Cryptosporidium. Further optimization is required before the method can be implemented för diagnostic use.
- Published
- 2024
4. Intrinsic disorder and tandem repeats - match made in evolution : Computational studies of molecular evolution
- Author
-
Lundström, Oxana and Lundström, Oxana
- Abstract
Proteins are both the building blocks and workers of the cell, carrying out most of the important functions. For a long time, their structure has been regarded as the primary factor for their function, but intrinsically disordered proteins demonstrate an alternative to this paradigm. Disordered proteins can temporarily assume different forms based on their interactions with other molecules and play critical roles in several biological processes, including cell signaling and regulation of gene expression. Tandem repeats are repeated patterns in genetic sequence. The role of tandem repeats in many protein structures is well documented today, but their role in disordered proteins is not entirely clear. This thesis aims to shed light on the mechanisms by which protein disorder and tandem repeats are linked. Only 2.5% of residues in all known protein sequences are characterized by the overlap of tandem repeats and protein disorder as described in Paper III, but many of these proteins have crucial functions and are linked to human diseases. Short tandem repeats emerge in this study as most frequently occurring in disordered regions. Genetic variation in disordered proteins accounts for length differences in eukaryotic genes (Paper I) and many orphan, recently evolved proteins, are disordered due to high GC content (Paper II). A medical application of this research is illustrated in the thesis with examples of variations in short tandem repeats (STRs) and their role in human diseases. Paper IV presents a comprehensive resource of human STR variation and Paper V illustrates how it can be used to identify specific STRs of interest, such as in the case of colorectal cancer where variations in certain STRs lead to altered gene expression patterns in tumors.
- Published
- 2023
5. Tandem Repeat DNA Provides Many Cytological Markers for Hybrid Zone Analysis in Two Subspecies of the Grasshopper Chorthippus parallelus
- Author
-
Navarro-Dominguez, Beatriz, Cabrero, Josefa, Lopez-Leon, Maria Dolores, Ruiz-Ruano, Francisco J., Pita, Miguel, Bella, Jose L., Camacho, Juan Pedro M., Navarro-Dominguez, Beatriz, Cabrero, Josefa, Lopez-Leon, Maria Dolores, Ruiz-Ruano, Francisco J., Pita, Miguel, Bella, Jose L., and Camacho, Juan Pedro M.
- Abstract
Recent advances in next generation sequencing (NGS) have greatly increased our understanding of non-coding tandem repeat (TR) DNA. Here we show how TR DNA can be useful for the study of hybrid zones (HZ), as it serves as a marker to identify introgression in areas where two biological entities come in contact. We used Illumina libraries to analyse two subspecies of the grasshopper Chorthippus parallelus, which currently form a HZ in the Pyrenees. We retrieved a total of 152 TR sequences, and used fluorescent in situ hybridization (FISH) to map 77 families in purebred individuals from both subspecies. Our analysis revealed 50 TR families that could serve as markers for analysis of this HZ, using FISH. Differential TR bands were unevenly distributed between chromosomes and subspecies. Some of these TR families yielded FISH bands in only one of the subspecies, suggesting the amplification of these TR families after the geographic separation of the subspecies in the Pleistocene. Our cytological analysis of two TR markers along a transect of the Pyrenean hybrid zone showed asymmetrical introgression of one subspecies into the other, consistent with previous findings using other markers. These results demonstrate the reliability of TR-band markers for hybrid zone studies.
- Published
- 2023
- Full Text
- View/download PDF
6. Tandem Repeat DNA Provides Many Cytological Markers for Hybrid Zone Analysis in Two Subspecies of the Grasshopper Chorthippus parallelus
- Author
-
Navarro-Dominguez, Beatriz, Cabrero, Josefa, Lopez-Leon, Maria Dolores, Ruiz-Ruano, Francisco J., Pita, Miguel, Bella, Jose L., Camacho, Juan Pedro M., Navarro-Dominguez, Beatriz, Cabrero, Josefa, Lopez-Leon, Maria Dolores, Ruiz-Ruano, Francisco J., Pita, Miguel, Bella, Jose L., and Camacho, Juan Pedro M.
- Abstract
Recent advances in next generation sequencing (NGS) have greatly increased our understanding of non-coding tandem repeat (TR) DNA. Here we show how TR DNA can be useful for the study of hybrid zones (HZ), as it serves as a marker to identify introgression in areas where two biological entities come in contact. We used Illumina libraries to analyse two subspecies of the grasshopper Chorthippus parallelus, which currently form a HZ in the Pyrenees. We retrieved a total of 152 TR sequences, and used fluorescent in situ hybridization (FISH) to map 77 families in purebred individuals from both subspecies. Our analysis revealed 50 TR families that could serve as markers for analysis of this HZ, using FISH. Differential TR bands were unevenly distributed between chromosomes and subspecies. Some of these TR families yielded FISH bands in only one of the subspecies, suggesting the amplification of these TR families after the geographic separation of the subspecies in the Pleistocene. Our cytological analysis of two TR markers along a transect of the Pyrenean hybrid zone showed asymmetrical introgression of one subspecies into the other, consistent with previous findings using other markers. These results demonstrate the reliability of TR-band markers for hybrid zone studies.
- Published
- 2023
- Full Text
- View/download PDF
7. Tandem Repeat DNA Provides Many Cytological Markers for Hybrid Zone Analysis in Two Subspecies of the Grasshopper Chorthippus parallelus
- Author
-
Navarro-Dominguez, Beatriz, Cabrero, Josefa, Lopez-Leon, Maria Dolores, Ruiz-Ruano, Francisco J., Pita, Miguel, Bella, Jose L., Camacho, Juan Pedro M., Navarro-Dominguez, Beatriz, Cabrero, Josefa, Lopez-Leon, Maria Dolores, Ruiz-Ruano, Francisco J., Pita, Miguel, Bella, Jose L., and Camacho, Juan Pedro M.
- Abstract
Recent advances in next generation sequencing (NGS) have greatly increased our understanding of non-coding tandem repeat (TR) DNA. Here we show how TR DNA can be useful for the study of hybrid zones (HZ), as it serves as a marker to identify introgression in areas where two biological entities come in contact. We used Illumina libraries to analyse two subspecies of the grasshopper Chorthippus parallelus, which currently form a HZ in the Pyrenees. We retrieved a total of 152 TR sequences, and used fluorescent in situ hybridization (FISH) to map 77 families in purebred individuals from both subspecies. Our analysis revealed 50 TR families that could serve as markers for analysis of this HZ, using FISH. Differential TR bands were unevenly distributed between chromosomes and subspecies. Some of these TR families yielded FISH bands in only one of the subspecies, suggesting the amplification of these TR families after the geographic separation of the subspecies in the Pleistocene. Our cytological analysis of two TR markers along a transect of the Pyrenean hybrid zone showed asymmetrical introgression of one subspecies into the other, consistent with previous findings using other markers. These results demonstrate the reliability of TR-band markers for hybrid zone studies.
- Published
- 2023
- Full Text
- View/download PDF
8. Tandem Repeat DNA Provides Many Cytological Markers for Hybrid Zone Analysis in Two Subspecies of the Grasshopper Chorthippus parallelus
- Author
-
Navarro-Dominguez, Beatriz, Cabrero, Josefa, Lopez-Leon, Maria Dolores, Ruiz-Ruano, Francisco J., Pita, Miguel, Bella, Jose L., Camacho, Juan Pedro M., Navarro-Dominguez, Beatriz, Cabrero, Josefa, Lopez-Leon, Maria Dolores, Ruiz-Ruano, Francisco J., Pita, Miguel, Bella, Jose L., and Camacho, Juan Pedro M.
- Abstract
Recent advances in next generation sequencing (NGS) have greatly increased our understanding of non-coding tandem repeat (TR) DNA. Here we show how TR DNA can be useful for the study of hybrid zones (HZ), as it serves as a marker to identify introgression in areas where two biological entities come in contact. We used Illumina libraries to analyse two subspecies of the grasshopper Chorthippus parallelus, which currently form a HZ in the Pyrenees. We retrieved a total of 152 TR sequences, and used fluorescent in situ hybridization (FISH) to map 77 families in purebred individuals from both subspecies. Our analysis revealed 50 TR families that could serve as markers for analysis of this HZ, using FISH. Differential TR bands were unevenly distributed between chromosomes and subspecies. Some of these TR families yielded FISH bands in only one of the subspecies, suggesting the amplification of these TR families after the geographic separation of the subspecies in the Pleistocene. Our cytological analysis of two TR markers along a transect of the Pyrenean hybrid zone showed asymmetrical introgression of one subspecies into the other, consistent with previous findings using other markers. These results demonstrate the reliability of TR-band markers for hybrid zone studies.
- Published
- 2023
- Full Text
- View/download PDF
9. Intrinsic disorder and tandem repeats - match made in evolution : Computational studies of molecular evolution
- Author
-
Lundström, Oxana and Lundström, Oxana
- Abstract
Proteins are both the building blocks and workers of the cell, carrying out most of the important functions. For a long time, their structure has been regarded as the primary factor for their function, but intrinsically disordered proteins demonstrate an alternative to this paradigm. Disordered proteins can temporarily assume different forms based on their interactions with other molecules and play critical roles in several biological processes, including cell signaling and regulation of gene expression. Tandem repeats are repeated patterns in genetic sequence. The role of tandem repeats in many protein structures is well documented today, but their role in disordered proteins is not entirely clear. This thesis aims to shed light on the mechanisms by which protein disorder and tandem repeats are linked. Only 2.5% of residues in all known protein sequences are characterized by the overlap of tandem repeats and protein disorder as described in Paper III, but many of these proteins have crucial functions and are linked to human diseases. Short tandem repeats emerge in this study as most frequently occurring in disordered regions. Genetic variation in disordered proteins accounts for length differences in eukaryotic genes (Paper I) and many orphan, recently evolved proteins, are disordered due to high GC content (Paper II). A medical application of this research is illustrated in the thesis with examples of variations in short tandem repeats (STRs) and their role in human diseases. Paper IV presents a comprehensive resource of human STR variation and Paper V illustrates how it can be used to identify specific STRs of interest, such as in the case of colorectal cancer where variations in certain STRs lead to altered gene expression patterns in tumors.
- Published
- 2023
10. Intrinsic disorder and tandem repeats - match made in evolution : Computational studies of molecular evolution
- Author
-
Lundström, Oxana and Lundström, Oxana
- Abstract
Proteins are both the building blocks and workers of the cell, carrying out most of the important functions. For a long time, their structure has been regarded as the primary factor for their function, but intrinsically disordered proteins demonstrate an alternative to this paradigm. Disordered proteins can temporarily assume different forms based on their interactions with other molecules and play critical roles in several biological processes, including cell signaling and regulation of gene expression. Tandem repeats are repeated patterns in genetic sequence. The role of tandem repeats in many protein structures is well documented today, but their role in disordered proteins is not entirely clear. This thesis aims to shed light on the mechanisms by which protein disorder and tandem repeats are linked. Only 2.5% of residues in all known protein sequences are characterized by the overlap of tandem repeats and protein disorder as described in Paper III, but many of these proteins have crucial functions and are linked to human diseases. Short tandem repeats emerge in this study as most frequently occurring in disordered regions. Genetic variation in disordered proteins accounts for length differences in eukaryotic genes (Paper I) and many orphan, recently evolved proteins, are disordered due to high GC content (Paper II). A medical application of this research is illustrated in the thesis with examples of variations in short tandem repeats (STRs) and their role in human diseases. Paper IV presents a comprehensive resource of human STR variation and Paper V illustrates how it can be used to identify specific STRs of interest, such as in the case of colorectal cancer where variations in certain STRs lead to altered gene expression patterns in tumors.
- Published
- 2023
11. Intrinsic disorder and tandem repeats - match made in evolution : Computational studies of molecular evolution
- Author
-
Lundström, Oxana and Lundström, Oxana
- Abstract
Proteins are both the building blocks and workers of the cell, carrying out most of the important functions. For a long time, their structure has been regarded as the primary factor for their function, but intrinsically disordered proteins demonstrate an alternative to this paradigm. Disordered proteins can temporarily assume different forms based on their interactions with other molecules and play critical roles in several biological processes, including cell signaling and regulation of gene expression. Tandem repeats are repeated patterns in genetic sequence. The role of tandem repeats in many protein structures is well documented today, but their role in disordered proteins is not entirely clear. This thesis aims to shed light on the mechanisms by which protein disorder and tandem repeats are linked. Only 2.5% of residues in all known protein sequences are characterized by the overlap of tandem repeats and protein disorder as described in Paper III, but many of these proteins have crucial functions and are linked to human diseases. Short tandem repeats emerge in this study as most frequently occurring in disordered regions. Genetic variation in disordered proteins accounts for length differences in eukaryotic genes (Paper I) and many orphan, recently evolved proteins, are disordered due to high GC content (Paper II). A medical application of this research is illustrated in the thesis with examples of variations in short tandem repeats (STRs) and their role in human diseases. Paper IV presents a comprehensive resource of human STR variation and Paper V illustrates how it can be used to identify specific STRs of interest, such as in the case of colorectal cancer where variations in certain STRs lead to altered gene expression patterns in tumors.
- Published
- 2023
12. The Low-Copy-Number Satellite DNAs of the Model Beetle Tribolium castaneum
- Author
-
Gržan, Tena, Dombi, Mira, Despot-Slade, Evelin, Veseljak, Damira, Volarić, Marin, Meštrović, Nevenka, Plohl, Miroslav, Mravinac, Brankica, Gržan, Tena, Dombi, Mira, Despot-Slade, Evelin, Veseljak, Damira, Volarić, Marin, Meštrović, Nevenka, Plohl, Miroslav, and Mravinac, Brankica
- Abstract
The red flour beetle Tribolium castaneum is an important pest of stored agricultural products and the first beetle whose genome was sequenced. So far, one high-copy-number and ten moderate-copy-number satellite DNAs (satDNAs) have been described in the assembled part of its genome. In this work, we aimed to catalog the entire collection of T. castaneum satDNAs. We resequenced the genome using Illumina technology and predicted potential satDNAs via graph-based sequence clustering. In this way, we discovered 46 novel satDNAs that occupied a total of 2.1% of the genome and were, therefore, considered low-copy-number satellites. Their repeat units, preferentially 140–180 bp and 300–340 bp long, showed a high A + T composition ranging from 59.2 to 80.1%. In the current assembly, we annotated the majority of the low-copy-number satDNAs on one or a few chromosomes, discovering mainly transposable elements in their vicinity. The current assembly also revealed that many of the in silico predicted satDNAs were organized into short arrays not much longer than five consecutive repeats, and some of them also had numerous repeat units scattered throughout the genome. Although 20% of the unassembled genome sequence masked the genuine state, the predominance of scattered repeats for some low-copy satDNAs raises the question of whether these are essentially interspersed repeats that occur in tandem only sporadically, with the potential to be satDNA “seeds”.
- Published
- 2023
13. Tandem Repeat DNA Provides Many Cytological Markers for Hybrid Zone Analysis in Two Subspecies of the Grasshopper Chorthippus parallelus
- Author
-
Navarro-Dominguez, Beatriz, Cabrero, Josefa, Lopez-Leon, Maria Dolores, Ruiz-Ruano, Francisco J., Pita, Miguel, Bella, Jose L., Camacho, Juan Pedro M., Navarro-Dominguez, Beatriz, Cabrero, Josefa, Lopez-Leon, Maria Dolores, Ruiz-Ruano, Francisco J., Pita, Miguel, Bella, Jose L., and Camacho, Juan Pedro M.
- Abstract
Recent advances in next generation sequencing (NGS) have greatly increased our understanding of non-coding tandem repeat (TR) DNA. Here we show how TR DNA can be useful for the study of hybrid zones (HZ), as it serves as a marker to identify introgression in areas where two biological entities come in contact. We used Illumina libraries to analyse two subspecies of the grasshopper Chorthippus parallelus, which currently form a HZ in the Pyrenees. We retrieved a total of 152 TR sequences, and used fluorescent in situ hybridization (FISH) to map 77 families in purebred individuals from both subspecies. Our analysis revealed 50 TR families that could serve as markers for analysis of this HZ, using FISH. Differential TR bands were unevenly distributed between chromosomes and subspecies. Some of these TR families yielded FISH bands in only one of the subspecies, suggesting the amplification of these TR families after the geographic separation of the subspecies in the Pleistocene. Our cytological analysis of two TR markers along a transect of the Pyrenean hybrid zone showed asymmetrical introgression of one subspecies into the other, consistent with previous findings using other markers. These results demonstrate the reliability of TR-band markers for hybrid zone studies.
- Published
- 2023
- Full Text
- View/download PDF
14. Unique features of satellite DNA transcription in different tissues of Caenorhabditis elegans
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorísmia, Bioinformàtica, Complexitat i Mètodes Formals, Subirana Torrent, Juan A., Messeguer Peypoch, Xavier, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorísmia, Bioinformàtica, Complexitat i Mètodes Formals, Subirana Torrent, Juan A., and Messeguer Peypoch, Xavier
- Abstract
A large part of the genome is known to be transcribed as non-coding DNA including some tandem repeats (satellites) such as telomeric/centromeric satellites in different species. However, there has been no detailed study on the eventual transcription of the interspersed satellites found in many species. In the present paper, we studied the transcription of the abundant DNA satellites in the nematode Caenorhabditis elegans using available RNA-Seq results. We found that many of them have been transcribed, but usually in an irregular manner; different regions of a satellite have been transcribed with variable efficiency. Satellites with a similar repeat sequence also have a different transcription pattern depending on their position in the genome. We also describe the peculiar features of satellites associated with Helitron transposons in C. elegans. Our demonstration that some satellite RNAs are transcribed adds a new family of non-coding RNAs, a new element in the world of RNA interference, with new paths for the control of mRNA translation. This is a field that requires further investigation and will provide a deeper understanding of gene expression and control., This work was supported by grant PID2021-122830OB-C43, funded by MCIN/AEI/10.13039/501100011033 and by “ERDF: A way of making Europe”., Peer Reviewed, Postprint (published version)
- Published
- 2023
15. The Low-Copy-Number Satellite DNAs of the Model Beetle Tribolium castaneum
- Author
-
Gržan, Tena, Dombi, Mira, Despot-Slade, Evelin, Veseljak, Damira, Volarić, Marin, Meštrović, Nevenka, Plohl, Miroslav, Mravinac, Brankica, Gržan, Tena, Dombi, Mira, Despot-Slade, Evelin, Veseljak, Damira, Volarić, Marin, Meštrović, Nevenka, Plohl, Miroslav, and Mravinac, Brankica
- Abstract
The red flour beetle Tribolium castaneum is an important pest of stored agricultural products and the first beetle whose genome was sequenced. So far, one high-copy-number and ten moderate-copy-number satellite DNAs (satDNAs) have been described in the assembled part of its genome. In this work, we aimed to catalog the entire collection of T. castaneum satDNAs. We resequenced the genome using Illumina technology and predicted potential satDNAs via graph-based sequence clustering. In this way, we discovered 46 novel satDNAs that occupied a total of 2.1% of the genome and were, therefore, considered low-copy-number satellites. Their repeat units, preferentially 140–180 bp and 300–340 bp long, showed a high A + T composition ranging from 59.2 to 80.1%. In the current assembly, we annotated the majority of the low-copy-number satDNAs on one or a few chromosomes, discovering mainly transposable elements in their vicinity. The current assembly also revealed that many of the in silico predicted satDNAs were organized into short arrays not much longer than five consecutive repeats, and some of them also had numerous repeat units scattered throughout the genome. Although 20% of the unassembled genome sequence masked the genuine state, the predominance of scattered repeats for some low-copy satDNAs raises the question of whether these are essentially interspersed repeats that occur in tandem only sporadically, with the potential to be satDNA “seeds”.
- Published
- 2023
16. A classical revival: Human satellite DNAs enter the genomics era.
- Author
-
Altemose, Nicolas, Altemose, Nicolas, Altemose, Nicolas, and Altemose, Nicolas
- Abstract
The classical human satellite DNAs, also referred to as human satellites 1, 2 and 3 (HSat1, HSat2, HSat3, or collectively HSat1-3), occur on most human chromosomes as large, pericentromeric tandem repeat arrays, which together constitute roughly 3% of the human genome (100 megabases, on average). Even though HSat1-3 were among the first human DNA sequences to be isolated and characterized at the dawn of molecular biology, they have remained almost entirely missing from the human genome reference assembly for 20 years, hindering studies of their sequence, regulation, and potential structural roles in the nucleus. Recently, the Telomere-to-Telomere Consortium produced the first truly complete assembly of a human genome, paving the way for new studies of HSat1-3 with modern genomic tools. This review provides an account of the history and current understanding of HSat1-3, with a view towards future studies of their evolution and roles in health and disease.
- Published
- 2022
17. Quality Measures in Process Mining: Tackling the scalability challenge
- Author
-
Reissner, Daniel André and Reissner, Daniel André
- Abstract
Nowadays, companies need to conform to strict compliance regulations. In the financial sector, this is particularly relevant as banks and insurance institutes are strictly regulated and violations can incur high fines. For example, Anti-Money Laundering and Know Your Customers regulations require banks to screen every transaction, report suspicious ones to regulatory bodies and acquire the identities of all their customers. In 2020, violations caused 10.6 billion dollars in fines for banks all over the world. Modern enterprise systems such as a loan management system or an enterprise resource planning system record all executed activities of business processes such as transactions in the form of event logs. These logs capture a process instance as a trace, which is a sequence of activities executed in the instance. Process mining aims to extract insights from event logs in order to as- sist organizations in their operational excellence. It offers a framework to identify compliance issues early by means of automated discovery and conformance checking techniques. Automated discovery can discover current as-is process models from an event log that can be analysed for compliance breaches. Conformance checking on the other hand can compare the executions of a log against a normative process model capturing compliance regulations to identify deviations. Quality measures in process mining are of fundamental importance to automated discovery and conformance checking as they allow analysts to select a high quality model from a set of automatically discovered models and more precisely pinpoint conformance issues. In particular, this thesis investigates two such quality measures: Fitness and Generalization. Fitness measures how well a process model can replay the traces of an event log, where activities that cannot be replayed on a normative process model are compliance issues. Trace alignments are the central artifact of fitness that are investigated in this thesis. They rela
- Published
- 2022
18. Structure, Organization, and Evolution of Satellite DNAs : Insights from the Drosophila repleta and D. virilis Species Groups.
- Author
-
Kuhn, Gustavo C S, Heringer, Pedro, Dias, Guilherme, Kuhn, Gustavo C S, Heringer, Pedro, and Dias, Guilherme
- Abstract
The fact that satellite DNAs (satDNAs) in eukaryotes are abundant genomic components, can perform functional roles, but can also change rapidly across species while being homogenous within a species, makes them an intriguing and fascinating genomic component to study. It is also becoming clear that satDNAs represent an important piece in genome architecture and that changes in their structure, organization, and abundance can affect the evolution of genomes and species in many ways. Since the discovery of satDNAs more than 50 years ago, species from the Drosophila genus have continuously been used as models to study several aspects of satDNA biology. These studies have been largely concentrated in D. melanogaster and closely related species from the Sophophora subgenus, even though the vast majority of all Drosophila species belong to the Drosophila subgenus. This chapter highlights some studies on the satDNA structure, organization, and evolution in two species groups from the Drosophila subgenus: the repleta and virilis groups. We also discuss and review the classification of other abundant tandem repeats found in these species in the light of the current information available.
- Published
- 2021
- Full Text
- View/download PDF
19. First Description of a Satellite DNA in Manatees' Centromeric Regions.
- Author
-
Valeri, Mirela Pelizaro, Dias, Guilherme, do Espírito Santo, Alice Alves, Moreira, Camila Nascimento, Yonenaga-Yassuda, Yatiyo, Sommer, Iara Braga, Kuhn, Gustavo C S, Svartman, Marta, Valeri, Mirela Pelizaro, Dias, Guilherme, do Espírito Santo, Alice Alves, Moreira, Camila Nascimento, Yonenaga-Yassuda, Yatiyo, Sommer, Iara Braga, Kuhn, Gustavo C S, and Svartman, Marta
- Abstract
Trichechus manatus and Trichechus inunguis are the two Sirenia species that occur in the Americas. Despite their increasing extinction risk, many aspects of their biology remain understudied, including the repetitive DNA fraction of their genomes. Here we used the sequenced genome of T. manatus and TAREAN to identify satellite DNAs (satDNAs) in this species. We report the first description of TMAsat, a satDNA comprising ~0.87% of the genome, with ~684bp monomers and centromeric localization. In T. inunguis, TMAsat showed similar monomer length, chromosome localization and conserved CENP-B box-like motifs as in T. manatus. We also detected this satDNA in the Dugong dugon and in the now extinct Hydrodamalis gigas genomes. The neighbor-joining tree shows that TMAsat sequences from T. manatus, T. inunguis, D. dugon, and H. gigas lack species-specific clusters, which disagrees with the predictions of concerted evolution. We detected a divergent TMAsat-like homologous sequence in elephants and hyraxes, but not in other mammals, suggesting this sequence was already present in the common ancestor of Paenungulata, and later became a satDNA in the Sirenians. This is the first description of a centromeric satDNA in manatees and will facilitate the inclusion of Sirenia in future studies of centromeres and satDNA biology.
- Published
- 2021
- Full Text
- View/download PDF
20. Identifying and characterizing de novo tandem repeat mutations and their contribution to autism spectrum disorders
- Author
-
Mitra, Ileena, Gymrek, Melissa1, Mitra, Ileena, Mitra, Ileena, Gymrek, Melissa1, and Mitra, Ileena
- Abstract
Genetic factors are known to make a large contribution to the risk of Autism Spectrum Disorders (ASD). The heritability of ASD is estimated to be over 50%, and it is estimated that de novo rare variants contribute in about 30% of simplex autism-affected cases. To date, population sequencing studies have been limited to analyzing single nucleotide variants (SNVs), small insertions and deletions (indels), or copy number variants (CNVs). This dissertation expands genetic research to further identify potential genomic regions and pathogenic mutations associated with ASD. Tandem repeats (TRs) are a class of repetitive structural variants composed of 1-20 base pair repeating units. TRs exhibit mutation rates that are orders of magnitude higher than SNPs, indels, or CNVs (6), and thus represent one of the largest sources of human genomic variability (4,5). TRs are often associated with diseases characterized by neurological and developmental symptoms (7–9). for example, Fragile X Syndrome, the most prevalent genetic cause of ASD. To date, direct studies of de novo TR mutations have been limited in population genetic studies. In this dissertation, I present a framework for population-scale characterization of genome-wide de novo TR mutations and their contribution to the genetic etiology of ASD. In my first chapter, I present my bioinformatics pipeline using MonSTR to analyze whole genome sequencing data to identify high- confidence, germline de novo TRs within parent-offspring trios. MonSTR, a novel statistical method, takes genotype likelihood values reported by a TR variant caller as input and estimates the posterior probability of a mutation resulting in a repeat copy number change at each TR loci in each child. In the following chapters, I present the results from identifying de novo TR mutations in autism- affected and unaffected children. I characterize patterns of TR mutational mechanism in the general population, in which I found an average of 54 de novo TRs per in
- Published
- 2021
21. Methods for studying the genome-wide landscape of tandem repeats
- Author
-
Mousavi, Nima, Gymrek, Melissa1, Mirarab, Siavash, Mousavi, Nima, Mousavi, Nima, Gymrek, Melissa1, Mirarab, Siavash, and Mousavi, Nima
- Abstract
Tandem Repeats (TRs) are a class of genetic variants formed by motifs of 1-20 nucleotides repeating in tandem. Previous studies show that expansion at specific TR loci is the leading cause of dozens of Mendelian disorders such as Huntington's disease and Fragile X syndrome. Furthermore, copy numbers at TR loci are correlated with complex traits such as gene expression. Tandem repeats are highly mutable and therefore a great subject to study genetic diversity. However, current bioinformatics pipelines are often incapable of processing these loci accurately. Challenges in sequencing, alignment, and interpretation have led to TR loci being overlooked in many studies. We have created a method for genome-wide genotyping of TRs and a toolkit for processing, filtering, and quality control of TR callsets. These methods have allowed us and the community to study repeat expansions on a genome-wide scale. In addition, we have applied our work to study de-novo variants contributing to Autism Spectrum Disorder risk and have found multiple candidate TRs. Another application of our methods is the novel tool for creating an ensemble callset of TRs across a large population. Our efforts in creating methods and applying them to various applications have allowed us to gain a better understanding of TRs and their genetic diversity on a population scale.
- Published
- 2021
22. Methods for studying the genome-wide landscape of tandem repeats
- Author
-
Mousavi, Nima, Gymrek, Melissa1, Mirarab, Siavash, Mousavi, Nima, Mousavi, Nima, Gymrek, Melissa1, Mirarab, Siavash, and Mousavi, Nima
- Abstract
Tandem Repeats (TRs) are a class of genetic variants formed by motifs of 1-20 nucleotides repeating in tandem. Previous studies show that expansion at specific TR loci is the leading cause of dozens of Mendelian disorders such as Huntington's disease and Fragile X syndrome. Furthermore, copy numbers at TR loci are correlated with complex traits such as gene expression. Tandem repeats are highly mutable and therefore a great subject to study genetic diversity. However, current bioinformatics pipelines are often incapable of processing these loci accurately. Challenges in sequencing, alignment, and interpretation have led to TR loci being overlooked in many studies. We have created a method for genome-wide genotyping of TRs and a toolkit for processing, filtering, and quality control of TR callsets. These methods have allowed us and the community to study repeat expansions on a genome-wide scale. In addition, we have applied our work to study de-novo variants contributing to Autism Spectrum Disorder risk and have found multiple candidate TRs. Another application of our methods is the novel tool for creating an ensemble callset of TRs across a large population. Our efforts in creating methods and applying them to various applications have allowed us to gain a better understanding of TRs and their genetic diversity on a population scale.
- Published
- 2021
23. TRAL 2.0 : tandem repeat detection with circular profile hidden Markov models and evolutionary aligner
- Author
-
Delucchi, Matteo, Näf, Paulina, Bliven, Spencer, Anisimova, Maria, Delucchi, Matteo, Näf, Paulina, Bliven, Spencer, and Anisimova, Maria
- Abstract
The Tandem Repeat Annotation Library (TRAL) focuses on analyzing tandem repeat units in genomic sequences. TRAL can integrate and harmonize tandem repeat annotations from a large number of external tools, and provides a statistical model for evaluating and filtering the detected repeats. TRAL version 2.0 includes new features such as a module for identifying repeats from circular profile hidden Markov models, a new repeat alignment method based on the progressive Poisson Indel Process, an improved installation procedure and a docker container. TRAL is an open-source Python 3 library and is available, together with documentation and tutorials viavital-it.ch/software/tral.
- Published
- 2021
24. In silico detection of variable number tandem repeats associated with Alzheimer’s disease from short-read sequencing data
- Author
-
Lucas, Francesca (author) and Lucas, Francesca (author)
- Abstract
Motivation: Alzheimer’s disease (AD) is a highly prevalent disease whose genetic risk factors remain largely unknown. One potential genetic risk factor is tandem repeat expansions, which have been associated with over 40 diseases, most of which affect the nervous system. Detecting VNTRs from short-read data is a challenging task, leaving many VNTRs unidentified. To date only one variable number tandem repeat (VNTR) expansion (in the ABCA7 gene) has been linked to AD. We hypothesize there are many more VNTR expansions to be discovered that associate with an increased risk of AD. Results: We created a pipeline with which we overcame the common limitations of VNTR detection (namely, the need for a predefined set of repeats and limited detectable VNTR sizes due to read length). We performed a genome-wide search for VNTRs with a motifsize ≥ 7 bp that show repeat size variations associated with AD. We detected 71 VNTR expansions and 1242 contractions, including expansions in genes ADAMTSL3, ARHGEF10, DIP2C, EVC2, GRM8, MPPED1, PID1 and an expansion in the SCIMP gene close to a well-known AD single nucleotide polymorphism (SNP). Our pipeline is, to our knowledge, one of the very few to detect VNTRs exceeding read length without a predefined set of repeats. It is able to detect both previously reported and novel VNTRs, resulting in a promising set of VNTRs showing an association with AD.
- Published
- 2021
25. Identifying and characterizing de novo tandem repeat mutations and their contribution to autism spectrum disorders
- Author
-
Mitra, Ileena, Gymrek, Melissa1, Mitra, Ileena, Mitra, Ileena, Gymrek, Melissa1, and Mitra, Ileena
- Abstract
Genetic factors are known to make a large contribution to the risk of Autism Spectrum Disorders (ASD). The heritability of ASD is estimated to be over 50%, and it is estimated that de novo rare variants contribute in about 30% of simplex autism-affected cases. To date, population sequencing studies have been limited to analyzing single nucleotide variants (SNVs), small insertions and deletions (indels), or copy number variants (CNVs). This dissertation expands genetic research to further identify potential genomic regions and pathogenic mutations associated with ASD. Tandem repeats (TRs) are a class of repetitive structural variants composed of 1-20 base pair repeating units. TRs exhibit mutation rates that are orders of magnitude higher than SNPs, indels, or CNVs (6), and thus represent one of the largest sources of human genomic variability (4,5). TRs are often associated with diseases characterized by neurological and developmental symptoms (7–9). for example, Fragile X Syndrome, the most prevalent genetic cause of ASD. To date, direct studies of de novo TR mutations have been limited in population genetic studies. In this dissertation, I present a framework for population-scale characterization of genome-wide de novo TR mutations and their contribution to the genetic etiology of ASD. In my first chapter, I present my bioinformatics pipeline using MonSTR to analyze whole genome sequencing data to identify high- confidence, germline de novo TRs within parent-offspring trios. MonSTR, a novel statistical method, takes genotype likelihood values reported by a TR variant caller as input and estimates the posterior probability of a mutation resulting in a repeat copy number change at each TR loci in each child. In the following chapters, I present the results from identifying de novo TR mutations in autism- affected and unaffected children. I characterize patterns of TR mutational mechanism in the general population, in which I found an average of 54 de novo TRs per in
- Published
- 2021
26. Methods for studying the genome-wide landscape of tandem repeats
- Author
-
Mousavi, Nima, Gymrek, Melissa1, Mirarab, Siavash, Mousavi, Nima, Mousavi, Nima, Gymrek, Melissa1, Mirarab, Siavash, and Mousavi, Nima
- Abstract
Tandem Repeats (TRs) are a class of genetic variants formed by motifs of 1-20 nucleotides repeating in tandem. Previous studies show that expansion at specific TR loci is the leading cause of dozens of Mendelian disorders such as Huntington's disease and Fragile X syndrome. Furthermore, copy numbers at TR loci are correlated with complex traits such as gene expression. Tandem repeats are highly mutable and therefore a great subject to study genetic diversity. However, current bioinformatics pipelines are often incapable of processing these loci accurately. Challenges in sequencing, alignment, and interpretation have led to TR loci being overlooked in many studies. We have created a method for genome-wide genotyping of TRs and a toolkit for processing, filtering, and quality control of TR callsets. These methods have allowed us and the community to study repeat expansions on a genome-wide scale. In addition, we have applied our work to study de-novo variants contributing to Autism Spectrum Disorder risk and have found multiple candidate TRs. Another application of our methods is the novel tool for creating an ensemble callset of TRs across a large population. Our efforts in creating methods and applying them to various applications have allowed us to gain a better understanding of TRs and their genetic diversity on a population scale.
- Published
- 2021
27. A Long-Term Conserved Satellite DNA That Remains Unexpanded in Several Genomes of Characiformes Fish Is Actively Transcribed
- Author
-
dos Santos, Rodrigo Zeni, Calegari, Rodrigo Milan, de Andrade Silva, Duilio Mazzoni Zerbinato, Ruiz-Ruano, Francisco J., Melo, Silvana, Oliveira, Claudio, Foresti, Fausto, Uliano-Silva, Marcela, Porto-Foresti, Fabio, Utsunomia, Ricardo, dos Santos, Rodrigo Zeni, Calegari, Rodrigo Milan, de Andrade Silva, Duilio Mazzoni Zerbinato, Ruiz-Ruano, Francisco J., Melo, Silvana, Oliveira, Claudio, Foresti, Fausto, Uliano-Silva, Marcela, Porto-Foresti, Fabio, and Utsunomia, Ricardo
- Abstract
Eukaryotic genomes contain large amounts of repetitive DNA sequences, such as tandemly repeated satelliteDNAs (satDNAs). These sequences are highly dynamic and tend to begenus- or species-specific due to their particular evolutionary pathways, although there are few unusual cases of conserved satDNAs over long periods of time. Here, we used multiple approaches to reveal that an satDNA named CharSat01-52 originated in the last common ancestor of Characoidei fish, a superfamily within the Characiformes order, similar to 140-78Ma, whereas its nucleotide composition has remained considerably conserved in several taxa. We show that 14 distantly related species within Characoidei share the presence of this satDNA, which is highly amplified and clustered in subtelomeric regions in a single species (Characidiumgomesi), while remained organized as small clusters in all the other species. Defying predictions of the molecular drive of satellite evolution, CharSat01-52 shows similar values of intra- and interspecific divergence. Although we did not provide evidence for a specific functional role of CharSat01-52, its transcriptional activity was demonstrated in different species. In addition, we identified short tandem arrays of CharSat01-52 embedded within single-molecule real-time long reads of Astyanax paranae (536 bp-3.1 kb) and A. mexicanus (501 bp-3.9 kb). Such arrays consisted of head-to-tail repeats and could b e found interspersed with other sequences, inverted sequences, or neighbored by other satellites. Our results provide a detailed characterization of an old and conserved satDNA, challenging general predictions of satDNA evolution.
- Published
- 2021
- Full Text
- View/download PDF
28. A Long-Term Conserved Satellite DNA That Remains Unexpanded in Several Genomes of Characiformes Fish Is Actively Transcribed
- Author
-
dos Santos, Rodrigo Zeni, Calegari, Rodrigo Milan, de Andrade Silva, Duilio Mazzoni Zerbinato, Ruiz-Ruano, Francisco J., Melo, Silvana, Oliveira, Claudio, Foresti, Fausto, Uliano-Silva, Marcela, Porto-Foresti, Fabio, Utsunomia, Ricardo, dos Santos, Rodrigo Zeni, Calegari, Rodrigo Milan, de Andrade Silva, Duilio Mazzoni Zerbinato, Ruiz-Ruano, Francisco J., Melo, Silvana, Oliveira, Claudio, Foresti, Fausto, Uliano-Silva, Marcela, Porto-Foresti, Fabio, and Utsunomia, Ricardo
- Abstract
Eukaryotic genomes contain large amounts of repetitive DNA sequences, such as tandemly repeated satelliteDNAs (satDNAs). These sequences are highly dynamic and tend to begenus- or species-specific due to their particular evolutionary pathways, although there are few unusual cases of conserved satDNAs over long periods of time. Here, we used multiple approaches to reveal that an satDNA named CharSat01-52 originated in the last common ancestor of Characoidei fish, a superfamily within the Characiformes order, similar to 140-78Ma, whereas its nucleotide composition has remained considerably conserved in several taxa. We show that 14 distantly related species within Characoidei share the presence of this satDNA, which is highly amplified and clustered in subtelomeric regions in a single species (Characidiumgomesi), while remained organized as small clusters in all the other species. Defying predictions of the molecular drive of satellite evolution, CharSat01-52 shows similar values of intra- and interspecific divergence. Although we did not provide evidence for a specific functional role of CharSat01-52, its transcriptional activity was demonstrated in different species. In addition, we identified short tandem arrays of CharSat01-52 embedded within single-molecule real-time long reads of Astyanax paranae (536 bp-3.1 kb) and A. mexicanus (501 bp-3.9 kb). Such arrays consisted of head-to-tail repeats and could b e found interspersed with other sequences, inverted sequences, or neighbored by other satellites. Our results provide a detailed characterization of an old and conserved satDNA, challenging general predictions of satDNA evolution.
- Published
- 2021
- Full Text
- View/download PDF
29. A Long-Term Conserved Satellite DNA That Remains Unexpanded in Several Genomes of Characiformes Fish Is Actively Transcribed
- Author
-
dos Santos, Rodrigo Zeni, Calegari, Rodrigo Milan, de Andrade Silva, Duilio Mazzoni Zerbinato, Ruiz-Ruano, Francisco J., Melo, Silvana, Oliveira, Claudio, Foresti, Fausto, Uliano-Silva, Marcela, Porto-Foresti, Fabio, Utsunomia, Ricardo, dos Santos, Rodrigo Zeni, Calegari, Rodrigo Milan, de Andrade Silva, Duilio Mazzoni Zerbinato, Ruiz-Ruano, Francisco J., Melo, Silvana, Oliveira, Claudio, Foresti, Fausto, Uliano-Silva, Marcela, Porto-Foresti, Fabio, and Utsunomia, Ricardo
- Abstract
Eukaryotic genomes contain large amounts of repetitive DNA sequences, such as tandemly repeated satelliteDNAs (satDNAs). These sequences are highly dynamic and tend to begenus- or species-specific due to their particular evolutionary pathways, although there are few unusual cases of conserved satDNAs over long periods of time. Here, we used multiple approaches to reveal that an satDNA named CharSat01-52 originated in the last common ancestor of Characoidei fish, a superfamily within the Characiformes order, similar to 140-78Ma, whereas its nucleotide composition has remained considerably conserved in several taxa. We show that 14 distantly related species within Characoidei share the presence of this satDNA, which is highly amplified and clustered in subtelomeric regions in a single species (Characidiumgomesi), while remained organized as small clusters in all the other species. Defying predictions of the molecular drive of satellite evolution, CharSat01-52 shows similar values of intra- and interspecific divergence. Although we did not provide evidence for a specific functional role of CharSat01-52, its transcriptional activity was demonstrated in different species. In addition, we identified short tandem arrays of CharSat01-52 embedded within single-molecule real-time long reads of Astyanax paranae (536 bp-3.1 kb) and A. mexicanus (501 bp-3.9 kb). Such arrays consisted of head-to-tail repeats and could b e found interspersed with other sequences, inverted sequences, or neighbored by other satellites. Our results provide a detailed characterization of an old and conserved satDNA, challenging general predictions of satDNA evolution.
- Published
- 2021
- Full Text
- View/download PDF
30. Dna satellites are transcribed as part of the non-coding genome in eukaryotes and bacteria
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorismia, Bioinformàtica, Complexitat i Mètodes Formals, Subirana Torrent, Juan A., Messeguer Peypoch, Xavier, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorismia, Bioinformàtica, Complexitat i Mètodes Formals, Subirana Torrent, Juan A., and Messeguer Peypoch, Xavier
- Abstract
It has been shown in recent years that many repeated sequences in the genome are expressed as RNA transcripts, although the role of such RNAs is poorly understood. Some isolated and tandem repeats (satellites) have been found to be transcribed, such as mammalian Alu sequences and telomeric/centromeric satellites in different species. However, there is no detailed study on the eventual transcription of the interspersed satellites found in many species. Therefore, we decided to study for the first time the transcription of the abundant DNA satellites in the bacterium Bacillus coagulans and in the nematode Caenorhabditis elegans. We have updated the data for C. elegans satellites using the latest version of the genome. We analyzed the transcription of satellites in both species in available RNA-seq results and found that they are widely transcribed. Our demonstration that satellite RNAs are transcribed adds a new family of non-coding RNAs. This is a field that requires further investigation and will provide a deeper understanding of gene expression and control., This work was supported by Ministerio de Ciencia e Innovación, Spain [Project RTI2018-094403-B-C33 funded by MCIN/ AEI 10.13039/501100011033/ FEDER]., Peer Reviewed, Postprint (published version)
- Published
- 2021
31. A Long-Term Conserved Satellite DNA That Remains Unexpanded in Several Genomes of Characiformes Fish Is Actively Transcribed
- Author
-
dos Santos, Rodrigo Zeni, Calegari, Rodrigo Milan, de Andrade Silva, Duilio Mazzoni Zerbinato, Ruiz-Ruano, Francisco J., Melo, Silvana, Oliveira, Claudio, Foresti, Fausto, Uliano-Silva, Marcela, Porto-Foresti, Fabio, Utsunomia, Ricardo, dos Santos, Rodrigo Zeni, Calegari, Rodrigo Milan, de Andrade Silva, Duilio Mazzoni Zerbinato, Ruiz-Ruano, Francisco J., Melo, Silvana, Oliveira, Claudio, Foresti, Fausto, Uliano-Silva, Marcela, Porto-Foresti, Fabio, and Utsunomia, Ricardo
- Abstract
Eukaryotic genomes contain large amounts of repetitive DNA sequences, such as tandemly repeated satelliteDNAs (satDNAs). These sequences are highly dynamic and tend to begenus- or species-specific due to their particular evolutionary pathways, although there are few unusual cases of conserved satDNAs over long periods of time. Here, we used multiple approaches to reveal that an satDNA named CharSat01-52 originated in the last common ancestor of Characoidei fish, a superfamily within the Characiformes order, similar to 140-78Ma, whereas its nucleotide composition has remained considerably conserved in several taxa. We show that 14 distantly related species within Characoidei share the presence of this satDNA, which is highly amplified and clustered in subtelomeric regions in a single species (Characidiumgomesi), while remained organized as small clusters in all the other species. Defying predictions of the molecular drive of satellite evolution, CharSat01-52 shows similar values of intra- and interspecific divergence. Although we did not provide evidence for a specific functional role of CharSat01-52, its transcriptional activity was demonstrated in different species. In addition, we identified short tandem arrays of CharSat01-52 embedded within single-molecule real-time long reads of Astyanax paranae (536 bp-3.1 kb) and A. mexicanus (501 bp-3.9 kb). Such arrays consisted of head-to-tail repeats and could b e found interspersed with other sequences, inverted sequences, or neighbored by other satellites. Our results provide a detailed characterization of an old and conserved satDNA, challenging general predictions of satDNA evolution.
- Published
- 2021
- Full Text
- View/download PDF
32. TRAL 2.0 : tandem repeat detection with circular profile hidden Markov models and evolutionary aligner
- Author
-
Delucchi, Matteo, Näf, Paulina, Bliven, Spencer, Anisimova, Maria, Delucchi, Matteo, Näf, Paulina, Bliven, Spencer, and Anisimova, Maria
- Abstract
The Tandem Repeat Annotation Library (TRAL) focuses on analyzing tandem repeat units in genomic sequences. TRAL can integrate and harmonize tandem repeat annotations from a large number of external tools, and provides a statistical model for evaluating and filtering the detected repeats. TRAL version 2.0 includes new features such as a module for identifying repeats from circular profile hidden Markov models, a new repeat alignment method based on the progressive Poisson Indel Process, an improved installation procedure and a docker container. TRAL is an open-source Python 3 library and is available, together with documentation and tutorials viavital-it.ch/software/tral.
- Published
- 2021
33. In silico detection of variable number tandem repeats associated with Alzheimer’s disease from short-read sequencing data
- Author
-
Lucas, Francesca (author) and Lucas, Francesca (author)
- Abstract
Motivation: Alzheimer’s disease (AD) is a highly prevalent disease whose genetic risk factors remain largely unknown. One potential genetic risk factor is tandem repeat expansions, which have been associated with over 40 diseases, most of which affect the nervous system. Detecting VNTRs from short-read data is a challenging task, leaving many VNTRs unidentified. To date only one variable number tandem repeat (VNTR) expansion (in the ABCA7 gene) has been linked to AD. We hypothesize there are many more VNTR expansions to be discovered that associate with an increased risk of AD. Results: We created a pipeline with which we overcame the common limitations of VNTR detection (namely, the need for a predefined set of repeats and limited detectable VNTR sizes due to read length). We performed a genome-wide search for VNTRs with a motifsize ≥ 7 bp that show repeat size variations associated with AD. We detected 71 VNTR expansions and 1242 contractions, including expansions in genes ADAMTSL3, ARHGEF10, DIP2C, EVC2, GRM8, MPPED1, PID1 and an expansion in the SCIMP gene close to a well-known AD single nucleotide polymorphism (SNP). Our pipeline is, to our knowledge, one of the very few to detect VNTRs exceeding read length without a predefined set of repeats. It is able to detect both previously reported and novel VNTRs, resulting in a promising set of VNTRs showing an association with AD.
- Published
- 2021
34. A Long-Term Conserved Satellite DNA That Remains Unexpanded in Several Genomes of Characiformes Fish Is Actively Transcribed
- Author
-
dos Santos, Rodrigo Zeni, Calegari, Rodrigo Milan, de Andrade Silva, Duilio Mazzoni Zerbinato, Ruiz-Ruano, Francisco J., Melo, Silvana, Oliveira, Claudio, Foresti, Fausto, Uliano-Silva, Marcela, Porto-Foresti, Fabio, Utsunomia, Ricardo, dos Santos, Rodrigo Zeni, Calegari, Rodrigo Milan, de Andrade Silva, Duilio Mazzoni Zerbinato, Ruiz-Ruano, Francisco J., Melo, Silvana, Oliveira, Claudio, Foresti, Fausto, Uliano-Silva, Marcela, Porto-Foresti, Fabio, and Utsunomia, Ricardo
- Abstract
Eukaryotic genomes contain large amounts of repetitive DNA sequences, such as tandemly repeated satelliteDNAs (satDNAs). These sequences are highly dynamic and tend to begenus- or species-specific due to their particular evolutionary pathways, although there are few unusual cases of conserved satDNAs over long periods of time. Here, we used multiple approaches to reveal that an satDNA named CharSat01-52 originated in the last common ancestor of Characoidei fish, a superfamily within the Characiformes order, similar to 140-78Ma, whereas its nucleotide composition has remained considerably conserved in several taxa. We show that 14 distantly related species within Characoidei share the presence of this satDNA, which is highly amplified and clustered in subtelomeric regions in a single species (Characidiumgomesi), while remained organized as small clusters in all the other species. Defying predictions of the molecular drive of satellite evolution, CharSat01-52 shows similar values of intra- and interspecific divergence. Although we did not provide evidence for a specific functional role of CharSat01-52, its transcriptional activity was demonstrated in different species. In addition, we identified short tandem arrays of CharSat01-52 embedded within single-molecule real-time long reads of Astyanax paranae (536 bp-3.1 kb) and A. mexicanus (501 bp-3.9 kb). Such arrays consisted of head-to-tail repeats and could b e found interspersed with other sequences, inverted sequences, or neighbored by other satellites. Our results provide a detailed characterization of an old and conserved satDNA, challenging general predictions of satDNA evolution.
- Published
- 2021
- Full Text
- View/download PDF
35. Tandem repeats in Bacillus: Unique features and taxonomic distribution
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorismia, Bioinformàtica, Complexitat i Mètodes Formals, Subirana Torrent, Juan A., Messeguer Peypoch, Xavier, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorismia, Bioinformàtica, Complexitat i Mètodes Formals, Subirana Torrent, Juan A., and Messeguer Peypoch, Xavier
- Abstract
Little is known about DNA tandem repeats across prokaryotes. We have recently described an enigmatic group of tandem repeats in bacterial genomes with a constant repeat size but variable sequence. These findings strongly suggest that tandem repeat size in some bacteria is under strong selective constraints. Here, we extend these studies and describe tandem repeats in a large set of Bacillus. Some species have very few repeats, while other species have a large number. Most tandem repeats have repeats with a constant size (either 52 or 20–21 nt), but a variable sequence. We characterize in detail these intriguing tandem repeats. Individual species have several families of tandem repeats with the same repeat length and different sequence. This result is in strong contrast with eukaryotes, where tandem repeats of many sizes are found in any species. We discuss the possibility that they are transcribed as small RNA molecules. They may also be involved in the stabilization of the nucleoid through interaction with proteins. We also show that the distribution of tandem repeats in different species has a taxonomic significance. The data we present for all tandem repeats and their families in these bacterial species will be useful for further genomic studies., This work was supported by Ministerio de Ciencia e Innovación –Agencia Estatal de Investigación, Spain (Project RTI2018-094403-B-C33) and FEDER., Peer Reviewed, Postprint (published version)
- Published
- 2021
36. Eight Million Years of Satellite DNA Evolution in Grasshoppers of the Genus Schistocerca Illuminate the Ins and Outs of the Library Hypothesis
- Author
-
Palacios-Gimenez, Octavio M., Milani, Diogo, Song, Hojun, Mart, Dardo A., López-León, Maria D., Ruiz-Ruano, Francisco J., Camacho, Juan Pedro M., Cabral-de-Mello, Diogo C., Palacios-Gimenez, Octavio M., Milani, Diogo, Song, Hojun, Mart, Dardo A., López-León, Maria D., Ruiz-Ruano, Francisco J., Camacho, Juan Pedro M., and Cabral-de-Mello, Diogo C.
- Abstract
Satellite DNA (satDNA) is an abundant class of tandemly repeated noncoding sequences, showing high rate of change in sequence, abundance, and physical location. However, the mechanisms promoting these changes are still controversial. The library model was put forward to explain the conservation of some satDNAs for long periods, predicting that related species share a common collection of satDNAs, which mostly experience quantitative changes. Here, we tested the library model by analyzing three satDNAs in ten species of Schistocerca grasshoppers. This group represents a valuable material because it diversified during the last 7.9 Myr across the American continent from the African desert locust (Schistocerca gregaria), and this thus illuminates the direction of evolutionary changes. By combining bioinformatic and cytogenetic, we tested whether these three satDNA families found in S. gregaria are also present in nine American species, and whether differential gains and/or losses have occurred in the lineages. We found that the three satDNAs are present in all species but display remarkable interspecies differences in their abundance and sequences while being highly consistent with genus phylogeny. The number of chromosomal loci where satDNA is present was also consistent with phylogeny for two satDNA families but not for the other. Our results suggest eminently chance events for satDNA evolution. Several evolutionary trends clearly imply either massive amplifications or contractions, thus closely fitting the library model prediction that changes are mostly quantitative. Finally, we found that satDNA amplifications or contractions may influence the evolution of monomer consensus sequences and by chance playing a major role in driftlike dynamics.
- Published
- 2020
- Full Text
- View/download PDF
37. Eight Million Years of Satellite DNA Evolution in Grasshoppers of the Genus Schistocerca Illuminate the Ins and Outs of the Library Hypothesis
- Author
-
Palacios-Gimenez, Octavio M., Milani, Diogo, Song, Hojun, Mart, Dardo A., López-León, Maria D., Ruiz-Ruano, Francisco J., Camacho, Juan Pedro M., Cabral-de-Mello, Diogo C., Palacios-Gimenez, Octavio M., Milani, Diogo, Song, Hojun, Mart, Dardo A., López-León, Maria D., Ruiz-Ruano, Francisco J., Camacho, Juan Pedro M., and Cabral-de-Mello, Diogo C.
- Abstract
Satellite DNA (satDNA) is an abundant class of tandemly repeated noncoding sequences, showing high rate of change in sequence, abundance, and physical location. However, the mechanisms promoting these changes are still controversial. The library model was put forward to explain the conservation of some satDNAs for long periods, predicting that related species share a common collection of satDNAs, which mostly experience quantitative changes. Here, we tested the library model by analyzing three satDNAs in ten species of Schistocerca grasshoppers. This group represents a valuable material because it diversified during the last 7.9 Myr across the American continent from the African desert locust (Schistocerca gregaria), and this thus illuminates the direction of evolutionary changes. By combining bioinformatic and cytogenetic, we tested whether these three satDNA families found in S. gregaria are also present in nine American species, and whether differential gains and/or losses have occurred in the lineages. We found that the three satDNAs are present in all species but display remarkable interspecies differences in their abundance and sequences while being highly consistent with genus phylogeny. The number of chromosomal loci where satDNA is present was also consistent with phylogeny for two satDNA families but not for the other. Our results suggest eminently chance events for satDNA evolution. Several evolutionary trends clearly imply either massive amplifications or contractions, thus closely fitting the library model prediction that changes are mostly quantitative. Finally, we found that satDNA amplifications or contractions may influence the evolution of monomer consensus sequences and by chance playing a major role in driftlike dynamics.
- Published
- 2020
- Full Text
- View/download PDF
38. Eight Million Years of Satellite DNA Evolution in Grasshoppers of the Genus Schistocerca Illuminate the Ins and Outs of the Library Hypothesis
- Author
-
Palacios-Gimenez, Octavio M., Milani, Diogo, Song, Hojun, Mart, Dardo A., López-León, Maria D., Ruiz-Ruano, Francisco J., Camacho, Juan Pedro M., Cabral-de-Mello, Diogo C., Palacios-Gimenez, Octavio M., Milani, Diogo, Song, Hojun, Mart, Dardo A., López-León, Maria D., Ruiz-Ruano, Francisco J., Camacho, Juan Pedro M., and Cabral-de-Mello, Diogo C.
- Abstract
Satellite DNA (satDNA) is an abundant class of tandemly repeated noncoding sequences, showing high rate of change in sequence, abundance, and physical location. However, the mechanisms promoting these changes are still controversial. The library model was put forward to explain the conservation of some satDNAs for long periods, predicting that related species share a common collection of satDNAs, which mostly experience quantitative changes. Here, we tested the library model by analyzing three satDNAs in ten species of Schistocerca grasshoppers. This group represents a valuable material because it diversified during the last 7.9 Myr across the American continent from the African desert locust (Schistocerca gregaria), and this thus illuminates the direction of evolutionary changes. By combining bioinformatic and cytogenetic, we tested whether these three satDNA families found in S. gregaria are also present in nine American species, and whether differential gains and/or losses have occurred in the lineages. We found that the three satDNAs are present in all species but display remarkable interspecies differences in their abundance and sequences while being highly consistent with genus phylogeny. The number of chromosomal loci where satDNA is present was also consistent with phylogeny for two satDNA families but not for the other. Our results suggest eminently chance events for satDNA evolution. Several evolutionary trends clearly imply either massive amplifications or contractions, thus closely fitting the library model prediction that changes are mostly quantitative. Finally, we found that satDNA amplifications or contractions may influence the evolution of monomer consensus sequences and by chance playing a major role in driftlike dynamics.
- Published
- 2020
- Full Text
- View/download PDF
39. Eight Million Years of Satellite DNA Evolution in Grasshoppers of the Genus Schistocerca Illuminate the Ins and Outs of the Library Hypothesis
- Author
-
Palacios-Gimenez, Octavio M., Milani, Diogo, Song, Hojun, Mart, Dardo A., López-León, Maria D., Ruiz-Ruano, Francisco J., Camacho, Juan Pedro M., Cabral-de-Mello, Diogo C., Palacios-Gimenez, Octavio M., Milani, Diogo, Song, Hojun, Mart, Dardo A., López-León, Maria D., Ruiz-Ruano, Francisco J., Camacho, Juan Pedro M., and Cabral-de-Mello, Diogo C.
- Abstract
Satellite DNA (satDNA) is an abundant class of tandemly repeated noncoding sequences, showing high rate of change in sequence, abundance, and physical location. However, the mechanisms promoting these changes are still controversial. The library model was put forward to explain the conservation of some satDNAs for long periods, predicting that related species share a common collection of satDNAs, which mostly experience quantitative changes. Here, we tested the library model by analyzing three satDNAs in ten species of Schistocerca grasshoppers. This group represents a valuable material because it diversified during the last 7.9 Myr across the American continent from the African desert locust (Schistocerca gregaria), and this thus illuminates the direction of evolutionary changes. By combining bioinformatic and cytogenetic, we tested whether these three satDNA families found in S. gregaria are also present in nine American species, and whether differential gains and/or losses have occurred in the lineages. We found that the three satDNAs are present in all species but display remarkable interspecies differences in their abundance and sequences while being highly consistent with genus phylogeny. The number of chromosomal loci where satDNA is present was also consistent with phylogeny for two satDNA families but not for the other. Our results suggest eminently chance events for satDNA evolution. Several evolutionary trends clearly imply either massive amplifications or contractions, thus closely fitting the library model prediction that changes are mostly quantitative. Finally, we found that satDNA amplifications or contractions may influence the evolution of monomer consensus sequences and by chance playing a major role in driftlike dynamics.
- Published
- 2020
- Full Text
- View/download PDF
40. Eight Million Years of Satellite DNA Evolution in Grasshoppers of the Genus Schistocerca Illuminate the Ins and Outs of the Library Hypothesis
- Author
-
Palacios-Gimenez, Octavio M., Milani, Diogo, Song, Hojun, Mart, Dardo A., López-León, Maria D., Ruiz-Ruano, Francisco J., Camacho, Juan Pedro M., Cabral-de-Mello, Diogo C., Palacios-Gimenez, Octavio M., Milani, Diogo, Song, Hojun, Mart, Dardo A., López-León, Maria D., Ruiz-Ruano, Francisco J., Camacho, Juan Pedro M., and Cabral-de-Mello, Diogo C.
- Abstract
Satellite DNA (satDNA) is an abundant class of tandemly repeated noncoding sequences, showing high rate of change in sequence, abundance, and physical location. However, the mechanisms promoting these changes are still controversial. The library model was put forward to explain the conservation of some satDNAs for long periods, predicting that related species share a common collection of satDNAs, which mostly experience quantitative changes. Here, we tested the library model by analyzing three satDNAs in ten species of Schistocerca grasshoppers. This group represents a valuable material because it diversified during the last 7.9 Myr across the American continent from the African desert locust (Schistocerca gregaria), and this thus illuminates the direction of evolutionary changes. By combining bioinformatic and cytogenetic, we tested whether these three satDNA families found in S. gregaria are also present in nine American species, and whether differential gains and/or losses have occurred in the lineages. We found that the three satDNAs are present in all species but display remarkable interspecies differences in their abundance and sequences while being highly consistent with genus phylogeny. The number of chromosomal loci where satDNA is present was also consistent with phylogeny for two satDNA families but not for the other. Our results suggest eminently chance events for satDNA evolution. Several evolutionary trends clearly imply either massive amplifications or contractions, thus closely fitting the library model prediction that changes are mostly quantitative. Finally, we found that satDNA amplifications or contractions may influence the evolution of monomer consensus sequences and by chance playing a major role in driftlike dynamics.
- Published
- 2020
- Full Text
- View/download PDF
41. Eight Million Years of Satellite DNA Evolution in Grasshoppers of the Genus Schistocerca Illuminate the Ins and Outs of the Library Hypothesis
- Author
-
Palacios-Gimenez, Octavio M., Milani, Diogo, Song, Hojun, Mart, Dardo A., López-León, Maria D., Ruiz-Ruano, Francisco J., Camacho, Juan Pedro M., Cabral-de-Mello, Diogo C., Palacios-Gimenez, Octavio M., Milani, Diogo, Song, Hojun, Mart, Dardo A., López-León, Maria D., Ruiz-Ruano, Francisco J., Camacho, Juan Pedro M., and Cabral-de-Mello, Diogo C.
- Abstract
Satellite DNA (satDNA) is an abundant class of tandemly repeated noncoding sequences, showing high rate of change in sequence, abundance, and physical location. However, the mechanisms promoting these changes are still controversial. The library model was put forward to explain the conservation of some satDNAs for long periods, predicting that related species share a common collection of satDNAs, which mostly experience quantitative changes. Here, we tested the library model by analyzing three satDNAs in ten species of Schistocerca grasshoppers. This group represents a valuable material because it diversified during the last 7.9 Myr across the American continent from the African desert locust (Schistocerca gregaria), and this thus illuminates the direction of evolutionary changes. By combining bioinformatic and cytogenetic, we tested whether these three satDNA families found in S. gregaria are also present in nine American species, and whether differential gains and/or losses have occurred in the lineages. We found that the three satDNAs are present in all species but display remarkable interspecies differences in their abundance and sequences while being highly consistent with genus phylogeny. The number of chromosomal loci where satDNA is present was also consistent with phylogeny for two satDNA families but not for the other. Our results suggest eminently chance events for satDNA evolution. Several evolutionary trends clearly imply either massive amplifications or contractions, thus closely fitting the library model prediction that changes are mostly quantitative. Finally, we found that satDNA amplifications or contractions may influence the evolution of monomer consensus sequences and by chance playing a major role in driftlike dynamics.
- Published
- 2020
- Full Text
- View/download PDF
42. Unique features of tandem repeats in bacteria
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorismia, Bioinformàtica, Complexitat i Mètodes Formals, Subirana Torrent, Juan A., Messeguer Peypoch, Xavier, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorismia, Bioinformàtica, Complexitat i Mètodes Formals, Subirana Torrent, Juan A., and Messeguer Peypoch, Xavier
- Abstract
DNA tandem repeats, or satellites, are well described in eukaryotic species, but little is known about their prevalence across prokaryotes. Here, we performed the most complete characterization to date of satellites in bacteria. We identified 121,638 satellites from 12,233 fully sequenced and assembled bacterial genomes with a very uneven distribution. We also determined the families of satellites which have a related sequence. There are 85 genomes that are particularly satellite rich and contain several families of satellites of yet unknown function. Interestingly, we only found two main types of noncoding satellites, depending on their repeat sizes, 22/44 or 52¿nucleotides (nt). An intriguing feature is the constant size of the repeats in the genomes of different species, whereas their sequences show no conservation. Individual species also have several families of satellites with the same repeat length and different sequences. This result is in marked contrast with previous findings in eukaryotes, where noncoding satellites of many sizes are found in any species investigated. We describe in greater detail these noncoding satellites in the spirochete Leptospira interrogans and in several bacilli. These satellites undoubtedly play a specific role in the species which have acquired them. We discuss the possibility that they represent binding sites for transcription factors not previously described or that they are involved in the stabilization of the nucleoid through interaction with proteins. IMPORTANCE We found an enigmatic group of noncoding satellites in 85 bacterial genomes with a constant repeat size but variable sequence. This pattern of DNA organization is unique and had not been previously described in bacteria. These findings strongly suggest that satellite size in some bacteria is under strong selective constraints and thus that satellites are very likely to play a fundamental role. We also provide a list and properties of all satellites in 12,233 genomes, This work was supported by Ministerio de Ciencia e Innovación–Agencia Estatal de Investigación, Spain (projects TIN2015-69175-C4-3-R and RTI2018-094403-B-C33), and FEDER., Peer Reviewed, Postprint (published version)
- Published
- 2020
43. Satellites in the prokaryote world
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorismia, Bioinformàtica, Complexitat i Mètodes Formals, Subirana Torrent, Juan A., Messeguer Peypoch, Xavier, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorismia, Bioinformàtica, Complexitat i Mètodes Formals, Subirana Torrent, Juan A., and Messeguer Peypoch, Xavier
- Abstract
Background Satellites or tandem repeats are very abundant in many eukaryotic genomes. Occasionally they have been reported to be present in some prokaryotes, but to our knowledge there is no general comparative study on their occurrence. For this reason we present here an overview of the distribution and properties of satellites in a set of representative species. Our results provide novel insights into the evolutionary relationship between eukaryotes, Archaea and Bacteria. Results We have searched all possible satellites present in the NCBI reference group of genomes in Archaea (142 species) and in Bacteria (119 species), detecting 2735 satellites in Archaea and 1067 in Bacteria. We have found that the distribution of satellites is very variable in different organisms. The archaeal Methanosarcina class stands out for the large amount of satellites in their genomes. Satellites from a few species have similar characteristics to those in eukaryotes, but most species have very few satellites: only 21 species in Archaea and 18 in Bacteria have more than 4 satellites/Mb. The distribution of satellites in these species is reminiscent of what is found in eukaryotes, but we find two significant differences: most satellites have a short length and many of them correspond to segments of genes coding for amino acid repeats. Transposition of non-coding satellites throughout the genome occurs rarely: only in the bacteria Leptospira interrogans and the archaea Methanocella conradii we have detected satellite families of transposed satellites with long repeats. Conclusions Our results demonstrate that the presence of satellites in the genome is not an exclusive feature of eukaryotes. We have described a few prokaryotes which do contain satellites. We present a discussion on their eventual evolutionary significance., Peer Reviewed, Postprint (published version)
- Published
- 2019
44. Computational characterization of the mtORF of pocilloporid corals: insights into protein structure and function in Stylophora lineages from contrasting environments.
- Author
-
Banguera Hinestroza, Eulalia, Ferrada, Evandro, Sawall, Yvonne, Flot, Jean-François, Banguera Hinestroza, Eulalia, Ferrada, Evandro, Sawall, Yvonne, and Flot, Jean-François
- Abstract
More than a decade ago, a new mitochondrial Open Reading Frame (mtORF) was discovered in corals of the family Pocilloporidae and has been used since then as an effective barcode for these corals. Recently, mtORF sequencing revealed the existence of two differentiated Stylophora lineages occurring in sympatry along the environmental gradient of the Red Sea (18.5°C to 33.9°C). In the endemic Red Sea lineage RS_LinB, the mtORF and the heat shock protein gene hsp70 uncovered similar phylogeographic patterns strongly correlated with environmental variations. This suggests that the mtORF too might be involved in thermal adaptation. Here, we used computational analyses to explore the features and putative function of this mtORF. In particular, we tested the likelihood that this gene encodes a functional protein and whether it may play a role in adaptation. Analyses of full mitogenomes showed that the mtORF originated in the common ancestor of Madracis and other pocilloporids, and that it encodes a transmembrane protein differing in length and domain architecture among genera. Homology-based annotation and the relative conservation of metal-binding sites revealed traces of an ancient hydrolase catalytic activity. Furthermore, signals of pervasive purifying selection, lack of stop codons in 1830 sequences analyzed, and a codon-usage bias similar to that of other mitochondrial genes indicate that the protein is functional, i.e. not a pseudogene. Other features, such as intrinsically disordered regions, tandem repeats, and signals of positive selection particularly in StylophoraRS_LinB populations, are consistent with a role of the mtORF in adaptive responses to environmental changes., SCOPUS: ar.j, info:eu-repo/semantics/published
- Published
- 2019
45. Virulotyping of Salmonella Enterica Serovar Typhi Isolates from Pakistan: Absence of Complete SPI-10 in Vi Negative Isolates
- Author
-
Inmunología, microbiología y parasitología, Immunologia, mikrobiologia eta parasitologia, Liaquat, Sadia, Sarwar, Yasra, Ali, Aamir, Haque, Abdul, Farooq, Muhammad, Martínez Ballesteros, Ilargi, Laorden Muñoz, Lorena, Garaizar Candina, Javier, Bikandi Bikandi, Joseba, Inmunología, microbiología y parasitología, Immunologia, mikrobiologia eta parasitologia, Liaquat, Sadia, Sarwar, Yasra, Ali, Aamir, Haque, Abdul, Farooq, Muhammad, Martínez Ballesteros, Ilargi, Laorden Muñoz, Lorena, Garaizar Candina, Javier, and Bikandi Bikandi, Joseba
- Abstract
The pathogenesis of Salmonella enterica serovar Typhi (S. Typhi), the cause of typhoid fever in humans, is mainly attributed to the acquisition of horizontally acquired DNA elements. Salmonella pathogenicity islands (SPIs) are indubitably the most important form of horizontally acquired DNA with respect to pathogenesis of this bacterium. The insertion or deletion of any of these transferrable SPIs may have impact on the virulence potential of S. Typhi. In this study, the virulence potential and genetic relatedness of 35 S. Typhi isolates, collected from 2004 to 2013 was determined by identification of SPI and non-SPI virulence factors through a combination of techniques including virulotyping, Whole Genome Sequencing (WGS), and Variable Number of Tandem Repeats (VNTR) profiling. In order to determine the virulence potential of local S. Typhi isolates, 56 virulence related genes were studied by PCR. These genes are located in the core as well as accessory genome (SPIs and plasmid). Major variations among studied virulence determinants were found in case of SPI-7 and SPI-10 associated genes. On the basis of presence of virulence related genes, the studied S. Typhi isolates from Pakistan were clustered into two virulotypes Vi-positive and Vi-negative. Interestingly, SPI-7 and SPI-10 were collectively absent or present in Vinegative and Vi-positive strains, respectively. Two Vi-negative and 11 Vi-positive S. Typhi strains were also analyzed by whole genome sequencing (WGS) and their results supported the PCR results. Genetic diversity was tested by VNTR-based molecular typing. All 35 isolates were clustered into five groups. Overall, all Vi-negative isolates were placed in a single group (T5) whereas Vi-positive isolates were grouped into four types. Vi-negative and Vi-positive isolates were mutually exclusive. This is the first report on the comparative distribution of SPI and non-SPI related virulence genes in Vi-negative and Vi-positive S. Typhi isolates with an impo
- Published
- 2018
46. Virulotyping of Salmonella Enterica Serovar Typhi Isolates from Pakistan: Absence of Complete SPI-10 in Vi Negative Isolates
- Author
-
Inmunología, microbiología y parasitología, Immunologia, mikrobiologia eta parasitologia, Liaquat, Sadia, Sarwar, Yasra, Ali, Aamir, Haque, Abdul, Farooq, Muhammad, Martínez Ballesteros, Ilargi, Laorden Muñoz, Lorena, Garaizar Candina, Javier, Bikandi Bikandi, Joseba, Inmunología, microbiología y parasitología, Immunologia, mikrobiologia eta parasitologia, Liaquat, Sadia, Sarwar, Yasra, Ali, Aamir, Haque, Abdul, Farooq, Muhammad, Martínez Ballesteros, Ilargi, Laorden Muñoz, Lorena, Garaizar Candina, Javier, and Bikandi Bikandi, Joseba
- Abstract
The pathogenesis of Salmonella enterica serovar Typhi (S. Typhi), the cause of typhoid fever in humans, is mainly attributed to the acquisition of horizontally acquired DNA elements. Salmonella pathogenicity islands (SPIs) are indubitably the most important form of horizontally acquired DNA with respect to pathogenesis of this bacterium. The insertion or deletion of any of these transferrable SPIs may have impact on the virulence potential of S. Typhi. In this study, the virulence potential and genetic relatedness of 35 S. Typhi isolates, collected from 2004 to 2013 was determined by identification of SPI and non-SPI virulence factors through a combination of techniques including virulotyping, Whole Genome Sequencing (WGS), and Variable Number of Tandem Repeats (VNTR) profiling. In order to determine the virulence potential of local S. Typhi isolates, 56 virulence related genes were studied by PCR. These genes are located in the core as well as accessory genome (SPIs and plasmid). Major variations among studied virulence determinants were found in case of SPI-7 and SPI-10 associated genes. On the basis of presence of virulence related genes, the studied S. Typhi isolates from Pakistan were clustered into two virulotypes Vi-positive and Vi-negative. Interestingly, SPI-7 and SPI-10 were collectively absent or present in Vinegative and Vi-positive strains, respectively. Two Vi-negative and 11 Vi-positive S. Typhi strains were also analyzed by whole genome sequencing (WGS) and their results supported the PCR results. Genetic diversity was tested by VNTR-based molecular typing. All 35 isolates were clustered into five groups. Overall, all Vi-negative isolates were placed in a single group (T5) whereas Vi-positive isolates were grouped into four types. Vi-negative and Vi-positive isolates were mutually exclusive. This is the first report on the comparative distribution of SPI and non-SPI related virulence genes in Vi-negative and Vi-positive S. Typhi isolates with an impo
- Published
- 2018
47. Virulotyping of Salmonella Enterica Serovar Typhi Isolates from Pakistan: Absence of Complete SPI-10 in Vi Negative Isolates
- Author
-
Inmunología, microbiología y parasitología, Immunologia, mikrobiologia eta parasitologia, Liaquat, Sadia, Sarwar, Yasra, Ali, Aamir, Haque, Abdul, Farooq, Muhammad, Martínez Ballesteros, Ilargi, Laorden Muñoz, Lorena, Garaizar Candina, Javier, Bikandi Bikandi, Joseba, Inmunología, microbiología y parasitología, Immunologia, mikrobiologia eta parasitologia, Liaquat, Sadia, Sarwar, Yasra, Ali, Aamir, Haque, Abdul, Farooq, Muhammad, Martínez Ballesteros, Ilargi, Laorden Muñoz, Lorena, Garaizar Candina, Javier, and Bikandi Bikandi, Joseba
- Abstract
The pathogenesis of Salmonella enterica serovar Typhi (S. Typhi), the cause of typhoid fever in humans, is mainly attributed to the acquisition of horizontally acquired DNA elements. Salmonella pathogenicity islands (SPIs) are indubitably the most important form of horizontally acquired DNA with respect to pathogenesis of this bacterium. The insertion or deletion of any of these transferrable SPIs may have impact on the virulence potential of S. Typhi. In this study, the virulence potential and genetic relatedness of 35 S. Typhi isolates, collected from 2004 to 2013 was determined by identification of SPI and non-SPI virulence factors through a combination of techniques including virulotyping, Whole Genome Sequencing (WGS), and Variable Number of Tandem Repeats (VNTR) profiling. In order to determine the virulence potential of local S. Typhi isolates, 56 virulence related genes were studied by PCR. These genes are located in the core as well as accessory genome (SPIs and plasmid). Major variations among studied virulence determinants were found in case of SPI-7 and SPI-10 associated genes. On the basis of presence of virulence related genes, the studied S. Typhi isolates from Pakistan were clustered into two virulotypes Vi-positive and Vi-negative. Interestingly, SPI-7 and SPI-10 were collectively absent or present in Vinegative and Vi-positive strains, respectively. Two Vi-negative and 11 Vi-positive S. Typhi strains were also analyzed by whole genome sequencing (WGS) and their results supported the PCR results. Genetic diversity was tested by VNTR-based molecular typing. All 35 isolates were clustered into five groups. Overall, all Vi-negative isolates were placed in a single group (T5) whereas Vi-positive isolates were grouped into four types. Vi-negative and Vi-positive isolates were mutually exclusive. This is the first report on the comparative distribution of SPI and non-SPI related virulence genes in Vi-negative and Vi-positive S. Typhi isolates with an impo
- Published
- 2018
48. Computing All Distinct Squares in Linear Time for Integer Alphabets
- Author
-
Hideo Bannai and Shunsuke Inenaga and Dominik Köppl, Bannai, Hideo, Inenaga, Shunsuke, Köppl, Dominik, Hideo Bannai and Shunsuke Inenaga and Dominik Köppl, Bannai, Hideo, Inenaga, Shunsuke, and Köppl, Dominik
- Abstract
Given a string on an integer alphabet, we present an algorithm that computes the set of all distinct squares belonging to this string in time linear to the string length. As an application, we show how to compute the tree topology of the minimal augmented suffix tree in linear time. Asides from that, we elaborate an algorithm computing the longest previous table in a succinct representation using compressed working space.
- Published
- 2017
- Full Text
- View/download PDF
49. 遺伝情報解析システムの構築とタンパク質リピート配列の研究
- Abstract
分子生物学の目覚ましい進歩により、核酸の塩基配列ならびにタンパク質のアミノ酸配列のデータが蓄積されつつある。これらのデータから、情報という立場で生物学の知識の体系化や生命の原理の法則化に取り組む新しい学問領域として「生命情報科学」もしくは「遺伝情報科学」が誕生した。その中の重要な研究領域として、データ解析に中心をおいた「遺伝情報解析」が急速に発展しつつある。本稿では、まず遺伝情報解析の概略と著者らが構築を進めている遺伝情報解析システムについて説明した。そして、構築システムを利用したこれまでの著者らの研究成果として、1)配列解析(ロイシン・リッチーリピートとホモポリマーの配列解析と分子進化)、2)立体構造予測(S100B とアメロジェニンタンパク質)、3)タンデムリピートと核酸の相互作用モデル(AlgR3 とUS11 タンパク質)に関する研究の概要を報告する。Vast sequence data of amino acids in proteins and nucleic acids have been increasing explosively along with the progress in molecular biology. Life informatics or genetic informatics using the sequence data is developing as a new field of research which is expected to provide a systematic knowledge in molecular biology and to solve some fundamental principles of life. Most of the main research has been based on sequence analysis which gives information on the structure, function and evolution of proteins and genes. In this article, first we describe a computer-based analysis system that was constructed in our laboratory. Secondly, we note interesting results of our studies obtained with this system. Our studies contain the sequence analyses of leucine-rich repeats and homopolymers within many proteins, and the structure predictions of S100B, amelogenin, and each of tandem repeats within algR3 and US11 which interact with DNA or RNA.
- Published
- 2016
50. Unraveling the effect of genomic structural changes in the rhesus macaque - implications for the adaptive role of inversions
- Author
-
Ministerio de Economía y Competitividad (España), Ullastres, Ana, Farré, Marta, Capilla, Laia, Ruiz-Herrera, Aurora, Ministerio de Economía y Competitividad (España), Ullastres, Ana, Farré, Marta, Capilla, Laia, and Ruiz-Herrera, Aurora
- Abstract
[Background] By reshuffling genomes, structural genomic reorganizations provide genetic variation on which natural selection can work. Understanding the mechanisms underlying this process has been a long-standing question in evolutionary biology. In this context, our purpose in this study is to characterize the genomic regions involved in structural rearrangements between human and macaque genomes and determine their influence on meiotic recombination as a way to explore the adaptive role of genome shuffling in mammalian evolution., [Results] We first constructed a highly refined map of the structural rearrangements and evolutionary breakpoint regions in the human and rhesus macaque genomes based on orthologous genes and whole-genome sequence alignments. Using two different algorithms, we refined the genomic position of known rearrangements previously reported by cytogenetic approaches and described new putative micro-rearrangements (inversions and indels) in both genomes. A detailed analysis of the rhesus macaque genome showed that evolutionary breakpoints are in gene-rich regions, being enriched in GO terms related to immune system. We also identified defense-response genes within a chromosome inversion fixed in the macaque lineage, underlying the relevance of structural genomic changes in evolutionary and/or adaptation processes. Moreover, by combining in silico and experimental approaches, we studied the recombination pattern of specific chromosomes that have suffered rearrangements between human and macaque lineages., [Conclusions] Our data suggest that adaptive alleles - in this case, genes involved in the immune response - might have been favored by genome rearrangements in the macaque lineage. © 2014 Ullastres et al.; licensee BioMed Central Ltd.
- Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.