1. Paralogs are revealed by proportion of heterozygotes and deviations in read ratios in genotyping-by-sequencing data from natural populations
- Author
-
James E. Seeb, Garrett J. McKinney, Ryan K. Waples, and Lisa W. Seeb
- Subjects
0301 basic medicine ,Heterozygote ,Berberis ,animal structures ,Genotype ,Genotyping Techniques ,Population ,Locus (genetics) ,Biology ,Genome ,03 medical and health sciences ,Salmon ,Databases, Genetic ,Genetics ,medicine ,Animals ,Allele ,education ,Ecology, Evolution, Behavior and Systematics ,education.field_of_study ,fungi ,Chromosome Mapping ,medicine.disease ,Genetics, Population ,030104 developmental biology ,Genetic Loci ,Tetrasomy ,Ploidy ,Biotechnology ,Reference genome - Abstract
Whole genome duplications have occurred in the recent ancestors of many plants, fish, and amphibians, resulting in a pervasiveness of paralogous loci and the potential for both disomic and tetrasomic inheritance in the same genome. Paralogs can be difficult to reliably genotype and are often excluded from genotyping-by-sequencing (GBS) analyses; however, removal requires paralogs to be identified which is difficult without a reference genome. We present a method for identifying paralogs in natural populations by combining two properties of duplicated loci: 1) the expected frequency of heterozygotes exceeds that for singleton loci, and 2) within heterozygotes, observed read ratios for each allele in GBS data will deviate from the 1:1 expected for singleton (diploid) loci. These deviations are often not apparent within individuals, particularly when sequence coverage is low; but, we postulated that summing allele reads for each locus over all heterozygous individuals in a population would provide sufficient power to detect deviations at those loci. We identified paralogous loci in three species: Chinook salmon (Oncorhynchus tshawytscha) which retains regions with ongoing residual tetrasomy on eight chromosome arms following a recent whole genome duplication, mountain barberry (Berberis alpina) which has a large proportion of paralogs that arose through an unknown mechanism, and dusky parrotfish (Scarus niger) which has largely re-diploidized following an ancient whole genome duplication. Importantly, this approach only requires the genotype and allele-specific read counts for each individual, information which is readily obtained from most GBS analysis pipelines. This article is protected by copyright. All rights reserved.
- Published
- 2016
- Full Text
- View/download PDF