Pandagma: a tool for identifying pan-gene sets and gene families at desired evolutionary depths and accommodating whole-genome duplications.

Authors :: Cannon, Steven B
Lee, Hyun-Oh
Weeks, Nathan T
Berendzen, Joel
Source :: Bioinformatics. Sep2024, Vol. 40 Issue 9, p1-4. 4p.
Publication Year :: 2024
Abstract: Summary Identification of allelic or corresponding genes (pan-genes) within a species or genus is important for discovery of biologically significant genetic conservation and variation. Similarly, identification of orthologs (gene families) across wider evolutionary distances is important for understanding the genetic basis for similar or differing traits. Especially in plants, several complications make identification of pan-genes and gene families challenging, including whole-genome duplications, evolutionary rate differences among lineages, and varying qualities of assemblies and annotations. Here, we document and distribute a set of workflows that we have used to address these problems. Results Pandagma is a set of configurable workflows for identifying and comparing pan-gene sets and gene families for annotation sets from eukaryotic genomes, using a combination of homology, synteny, and expected rates of synonymous change in coding sequence. Availability and implementation The Pandagma workflows, example configurations, implementation details, and scripts for retrieving public datasets, are available at https://github.com/legumeinfo/pandagma [ABSTRACT FROM AUTHOR]

Subjects :: *EUKARYOTIC genomes
*GENETIC variation
*ANNOTATIONS
*SPECIES
*SCRIPTS

Full Text Access

Tools