1. Which Variable Should Be Dependent in Phylogenetic Generalized Least Squares Regression Analysis Under Pagel's Lambda Model
- Author
-
Zheng-Lin Chen, Hong-Ji Guo, and Deng-Ke Niu
- Abstract
Phylogenetic generalized least squares (PGLS) regression is widely used to analyze evolutionary correlations between two traits. In contrast to the equal treatment of analyzed traits in conventional correlation methods such as Pearson and Spearman's rank tests, PGLS regression accounts for phylogenetic non-independence unbalanced between independent and dependent variables when estimating some evolutionary parameters. We investigated the correlations of CRISPR-Cas and prophage contents with optimal growth temperature and minimal doubling time using Pagel's λ model and found that switching independent and dependent variables resulted in conflicting results in 26.3% of cases. To better understand this phenomenon, we conducted 12,000 simulations of the evolution of two traits (X1 and X2) along a binary tree with 100 terminal nodes, using different models and variances. We found that switching dependent and independent variables in PGLS regression analysis using Pagel's λ Model resulted in conflicting outcomes in 17.2% of cases. To assess correlations in each simulation, we established a gold standard by analyzing changes in traits along phylogenetic branches. Next, we tested seven potential criteria for dependent variable selection: log-likelihood, Akaike information criterion, R2, p-value, Pagel's λ, Blomberg et al.'s K, and the estimated λ in Pagel's λ model. We determined that the last three criteria performed equally well in selecting the dependent variable and were superior to the other four. As Pagel's λ and Blomberg et al.'s K values are indicators of phylogenetic signals and are commonly calculated at the beginning of phylogenetic comparative studies, we suggest using the trait with a higher λ or K value as the dependent variable for practicality in future PGLS regressions. Logical cause and effect analysis should be conducted after establishing a significant correlation by PGLS regression, rather than providing an indicator for the dependent variable.
- Published
- 2023
- Full Text
- View/download PDF