1. Effects of missing data and data type on phylotranscriptomic analysis of stony corals (Cnidaria: Anthozoa: Scleractinia).
- Author
-
Quek, Zheng Bin Randolph and Huang, Danwei
- Subjects
- *
SCLERACTINIA , *INVERTEBRATE phylogeny , *MISSING data (Statistics) , *ROBUST statistics , *AMINO acids - Abstract
Graphical abstract Highlights • Phylotranscriptomic analysis is generally robust against data incompleteness. • Phylotranscriptomic tree of stony corals is broadly consistent with Sanger-sequenced trees. • DNA alignments outperform amino acid alignments in tree consistency and node support. • Gene tree incongruity affects concatenated amino acid alignments more than DNA data. Abstract Across the tree of life, phylogenetic analysis is increasingly being performed using transcriptome data. As a result of heterogeneous gene expression within individual organisms and unequal sequencing depth between samples, coverage of homologous loci in such datasets is typically inhomogeneous. Consequently, missing data are a common feature of phylotranscriptomic inference, but their impact on phylogenetic analysis remains poorly characterised empirically. Considering the complexity of the evolutionary history of stony corals (Cnidaria: Anthozoa: Scleractinia), transcriptome data hold great promise for resolving their phylogeny, particularly if there is a good understanding of missing data and data type (either amino acid or DNA) effects. Here, we reconstructed a broad phylogenetic tree of 39 scleractinian species with 3 corallimorpharians as outgroups, including 15 transcriptomes that were newly sequenced and assembled in this study. Between 63 and 505 loci were used to analyse the scleractinian phylogeny, and we quantified differences in tree topology, tree shape, bootstrap support and effects of conflicting gene trees among datasets of varying completeness for both amino acid and DNA sequences. Even with almost 70% missing data, tree topologies appear to be mostly unaffected, although there are higher incongruence levels in the less complete datasets. Furthermore, DNA trees outperform amino acid trees in bootstrap support and robustness against incongruent loci. Overall, our findings indicate that high levels of missing data can still produce expected tree topologies, but identifying and omitting incongruent loci can lead to more consistent branch length estimates. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF