Back to Search Start Over

A Protocol to Extract a Specific Genomic Region from a Public Whole-Genome Database and Modify Analytical Bin Length for Population Genetic Studies

Authors :
Muhammad Shoaib Akhtar
Shoji Kawamura
Source :
Methods and Protocols, Vol 7, Iss 4, p 57 (2024)
Publication Year :
2024
Publisher :
MDPI AG, 2024.

Abstract

With the advent of “next-generation” sequencing and the continuous reduction in sequencing costs, an increasing amount of genomic data has emerged, such as whole-genome, whole-exome, and targeted sequencing data. These applications are popular not only in mega sequencing projects, such as the 1000 Genomes Project and UK BioBank, but also among individual researchers. Evolutionary genetic analyses, such as the dN/dS ratio and Tajima’s D, are demanded more and more for whole-genome-level population data. These analyses are often carried out under a uniform custom bin size across the genome. However, these analyses require subdivision of a genomic region into functional units, such as protein-coding regions, introns, and untranslated regions, and computing these genetic measures for large-scale data remains challenging. In a recent investigation, we successfully devised a method to address this issue. This method requires a multi-sample VCF file containing population data, a reference genome, target regions in the BED file, and a list of samples to be included in the analysis. Given that the targeted regions are extracted in a new VCF file, targeted population genetic analysis can be performed. We conducted Tajima’s D analysis using this approach on intact and pseudogenes, as well as non-coding regions.

Details

Language :
English
ISSN :
24099279
Volume :
7
Issue :
4
Database :
Directory of Open Access Journals
Journal :
Methods and Protocols
Publication Type :
Academic Journal
Accession number :
edsdoj.8a1b65b8efd4c7b9eb207a690d416e3
Document Type :
article
Full Text :
https://doi.org/10.3390/mps7040057