Back to Search Start Over

Sparse semiparametric canonical correlation analysis for data of mixed types

Authors :
Grace Yoon
Raymond J. Carroll
Irina Gaynanova
Source :
Biometrika
Publication Year :
2020
Publisher :
Oxford University Press (OUP), 2020.

Abstract

Canonical correlation analysis investigates linear relationships between two sets of variables, but often works poorly on modern data sets due to high-dimensionality and mixed data types such as continuous, binary and zero-inflated. To overcome these challenges, we propose a semiparametric approach for sparse canonical correlation analysis based on Gaussian copula. Our main contribution is a truncated latent Gaussian copula model for data with excess zeros, which allows us to derive a rank-based estimator of the latent correlation matrix for mixed variable types without the estimation of marginal transformation functions. The resulting canonical correlation analysis method works well in high-dimensional settings as demonstrated via numerical studies, as well as in application to the analysis of association between gene expression and micro RNA data of breast cancer patients.<br />Accepted to Biometrika. Main text: 19 pages and 3 figures. Supplementary material: 28 pages and 9 figures

Details

ISSN :
14643510 and 00063444
Volume :
107
Database :
OpenAIRE
Journal :
Biometrika
Accession number :
edsair.doi.dedup.....8078fbb9e3587583a298620bdbd6ea1f
Full Text :
https://doi.org/10.1093/biomet/asaa007