1. High-dimensional survival data analysis and it application to microarray data.
- Author
-
Wang, Sijian
- Subjects
- Application, High-dimensional Data, Microarray Data, Regularization, Survival Analysis
- Abstract
In this dissertation, we consider the high-dimensional survival data analysis, which becomes increasingly important in modern statistical research. In Chapter 2, we have developed a doubly penalized Buckley-James method, which is an extension of the elastic net regression (Zou and Hastie, 2005) to right censored survival data. By applying a mixture of the L 1-norm and L2-norm penalties as in elastic net regression, our method not only carries out automatic variable selection and parameter estimation simultaneously, but also automatically selects (or removes) highly correlated variables together. We also introduce the uniform design method of Fang and Wang (1994) for selecting multi-dimensional tuning parameters, which is much more efficient than the simple grid search that is commonly used in the literature. In Chapter 3, we have proposed a novel hierarchically penalized Cox regression method to address the group variable selection problem in Cox's proportional hazards model with grouped prediction variables. Our method performs variable selection at both the group level and the within-group level, and offers the potential for achieving the asymptotic oracle property as in Fan and Li (2001, 2002). In Chapter 4, we have proposed a novel random lasso method for variable selection. The idea of random lasso is mimicking the random forest method (Breiman, 2001). By drawing bootstrap samples from the original training set and randomly selecting candidate variables, the average of the predictive models based on multiple bootstrap samples improves the lasso method in two aspects: It selects or removes highly correlated variables more effectively, and the number of variables selected is not limited by the sample size. Furthermore, our method does not force the coefficients of highly correlated variables close to each other, which elastic net regression tends to do, therefore has more flexibility in estimating coefficients especially when they have different signs.
- Published
- 2008