Back to Search Start Over

Robust Regression with Compositional Covariates

Authors :
Mishra, Aditya
Muller, Christian L.
Publication Year :
2019

Abstract

Many biological high-throughput data sets, such as targeted amplicon-based and metagenomic sequencing data, are compositional in nature. A common exploratory data analysis task is to infer statistical associations between the high-dimensional microbial compositions and habitat- or host-related covariates. We propose a general robust statistical regression framework, RobRegCC (Robust Regression with Compositional Covariates), which extends the linear log-contrast model by a mean shift formulation for capturing outliers. RobRegCC includes sparsity-promoting convex and non-convex penalties for parsimonious model estimation, a data-driven robust initialization procedure, and a novel robust cross-validation model selection scheme. We show RobRegCC's ability to perform simultaneous sparse log-contrast regression and outlier detection over a wide range of simulation settings and provide theoretical non-asymptotic guarantees for the underlying estimators. To demonstrate the seamless applicability of the workflow on real data, we consider a gut microbiome data set from HIV patients and infer robust associations between a sparse set of microbial species and host immune response from soluble CD14 measurements. All experiments are fully reproducible and available on GitHub at https://github.com/amishra-stats/robregcc.<br />Comment: 43 pages, 12 figures

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.1909.04990
Document Type :
Working Paper