1. Whole-genome sequence of the oriental lung flukeParagonimus westermani
- Author
-
Donald P. McManus, Malcolm K. Jones, Geoffrey N. Gobert, Kanwar Narain, Mark A. Ragan, Takeshi Agatsuma, Harald Oey, Lutz Krause, K Rekha Devi, Martha Zakrzewski, and Sujeevi Nawaratna
- Subjects
0106 biological sciences ,Paragonimus westermani ,oriental lung fluke ,Health Informatics ,comparative genomics ,Data Note ,whole-genome sequence ,01 natural sciences ,Genome ,03 medical and health sciences ,Genome Size ,Paragonimus ,Sequence Homology, Nucleic Acid ,parasitic diseases ,medicine ,Animals ,Repeated sequence ,Genome size ,Phylogeny ,030304 developmental biology ,Paragonimiasis ,Whole genome sequencing ,Genetics ,Genome, Helminth ,0303 health sciences ,Whole Genome Sequencing ,biology ,paragonimiasis ,High-Throughput Nucleotide Sequencing ,Molecular Sequence Annotation ,biology.organism_classification ,medicine.disease ,Computer Science Applications ,Long interspersed nuclear element ,neglected tropical disease ,parasitic infection ,Genome, Mitochondrial ,genome assembly ,food-borne disease ,flatworm ,010606 plant biology & botany - Abstract
Background Foodborne infections caused by lung flukes of the genus Paragonimus are a significant and widespread public health problem in tropical areas. Approximately 50 Paragonimus species have been reported to infect animals and humans, but Paragonimus westermani is responsible for the bulk of human disease. Despite their medical and economic importance, no genome sequence for any Paragonimus species is available. Results We sequenced and assembled the genome of P. westermani, which is among the largest of the known pathogen genomes with an estimated size of 1.1 Gb. A 922.8 Mb genome assembly was generated from Illumina and Pacific Biosciences (PacBio) sequence data, covering 84% of the estimated genome size. The genome has a high proportion (45%) of repeat-derived DNA, particularly of the long interspersed element and long terminal repeat subtypes, and the expansion of these elements may explain some of the large size. We predicted 12,852 protein coding genes, showing a high level of conservation with related trematode species. The majority of proteins (80%) had homologs in the human liver fluke Opisthorchis viverrini, with an average sequence identity of 64.1%. Assembly of the P. westermani mitochondrial genome from long PacBio reads resulted in a single high-quality circularized 20.6 kb contig. The contig harbored a 6.9 kb region of non-coding repetitive DNA comprised of three distinct repeat units. Our results suggest that the region is highly polymorphic in P. westermani, possibly even within single worm isolates. Conclusions The generated assembly represents the first Paragonimus genome sequence and will facilitate future molecular studies of this important, but neglected, parasite group.
- Published
- 2018