Back to Search
Start Over
Identification of DNA Termini in Sequencing Data through Combined Analysis of End Capture and Local Strand Bias
- Publication Year :
- 2023
- Publisher :
- Stanford Digital Repository, 2023.
-
Abstract
- Detecting DNA termini, such as ends of linear extrachromosomal DNA, plays an essential role in understanding the structure and functions of DNA molecules. Here we describe an approach combining direct and indirect computational methods to detect DNA termini from next-generation short-read sequencing. While a direct inference of ends can come from mapping the specific capture points of DNA fragments, this approach is insufficient for analytical pipelines where the DNA termini are not captured. Thus, we add an indirect detection of ends based on strand bias, the difference in sequence representation between the plus and minus strands of DNA in a dataset. Termini are reflected by a strong strand bias, with inward-facing reads greatly enriched over outward-facing reads in the immediate proximity of any end. Applying this analysis to negative control regions (where DNA is continuous and with no known termini), we observe no strong end capture peaks or strand bias. Applying to positive control regions where known DNA termini are present yields strong strand bias signals even in cases where blocked termini prevent end capture (for a protein-blocked adenovirus), or where ends are not explicitly captured (tagmentation of restriction digested lambda DNA). Analysis of a more complex situation (HIV replication) produces a picture that includes both the known termini of the reverse-transcribed genome (the PBS [primer binding site] on the negative strand and the PPT [3’ polypurine tract] on the positive strand) as well as a signal corresponding to a previously described additional initiation site for second strand synthesis (cPPT [central polypurine tract]). These results confirm the ability to detect DNA structural discontinuities in a pooled sample where high throughput shotgun sequence data is available. In addition to the known initiation sequence in the HIV genome, we detect a signal of positive strand DNA termini at several positions on the plus strand sequence. These sites share several characteristics with the previously characterized second strand initiation sites (the cPPT and 3’ PPT sites): (i) observed spike in directly captured cDNA ends, (ii) an indirect terminus signal evident in localized strand bias, (iii) a strong preference for forward-facing termini, (iv) an upstream purine-rich motif, and (v) a decrease in terminus signal at late time points after infection. These characteristics are consistent in duplicate samples in two different genotypes (wild type and integrase-lacking HIV). The observation of distinct internal termini associated with multiple purine rich regions suggests the possibility that multiple internal initiations of second strand synthesis might contribute to HIV replication through acceleration of second strand synthesis and/or strand displacement at the HIV 3’ end.
- Subjects :
- HIV reverse transcription
Restriction enzyme
Double-stranded DNA
Central polypurine tract (cPPT)
Illumina
Genetics
Adenovirus
Molecular genetics
Bacteriophage
Caenorhabditis elegans
Biology
Cancer
Innate immunity
Retrovirus
Next-generation sequencing (NGS)
DNA termini
HIV
Genomics
DNA
Bacteriophage lambda
Mitochondrial DNA
Mitochondria
Linear extrachromosomal DNA
HIV replication
Alternative polypurine tracts (altPPT)
High-throughput sequencing (HTS)
FOS: Biological sciences
Phage
RNA
Purine-rich sequences
Single-stranded DNA
Chlamydomonas reinhardtii
Extrachromosomal DNA
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi...........1b50bf32c0b796490d18219f72cc86fe
- Full Text :
- https://doi.org/10.25740/xf213fn4785