1. Robust Regression with Compositional Covariates
- Author
-
Mishra, Aditya and Muller, Christian L.
- Subjects
Statistics - Methodology ,Statistics - Applications - Abstract
Many biological high-throughput data sets, such as targeted amplicon-based and metagenomic sequencing data, are compositional in nature. A common exploratory data analysis task is to infer statistical associations between the high-dimensional microbial compositions and habitat- or host-related covariates. We propose a general robust statistical regression framework, RobRegCC (Robust Regression with Compositional Covariates), which extends the linear log-contrast model by a mean shift formulation for capturing outliers. RobRegCC includes sparsity-promoting convex and non-convex penalties for parsimonious model estimation, a data-driven robust initialization procedure, and a novel robust cross-validation model selection scheme. We show RobRegCC's ability to perform simultaneous sparse log-contrast regression and outlier detection over a wide range of simulation settings and provide theoretical non-asymptotic guarantees for the underlying estimators. To demonstrate the seamless applicability of the workflow on real data, we consider a gut microbiome data set from HIV patients and infer robust associations between a sparse set of microbial species and host immune response from soluble CD14 measurements. All experiments are fully reproducible and available on GitHub at https://github.com/amishra-stats/robregcc., Comment: 43 pages, 12 figures
- Published
- 2019