Back to Search
Start Over
Simulation‐selection‐extrapolation: Estimation in high‐dimensional errors‐in‐variables models
- Source :
- Biometrics. 75:1133-1144
- Publication Year :
- 2019
- Publisher :
- Wiley, 2019.
-
Abstract
- Errors-in-variables models in high-dimensional settings pose two challenges in application. First, the number of observed covariates is larger than the sample size, while only a small number of covariates are true predictors under an assumption of model sparsity. Second, the presence of measurement error can result in severely biased parameter estimates, and also affects the ability of penalized methods such as the lasso to recover the true sparsity pattern. A new estimation procedure called SIMulation-SELection-EXtrapolation (SIMSELEX) is proposed. This procedure makes double use of lasso methodology. First, the lasso is used to estimate sparse solutions in the simulation step, after which a group lasso is implemented to do variable selection. The SIMSELEX estimator is shown to perform well in variable selection, and has significantly lower estimation error than naive estimators that ignore measurement error. SIMSELEX can be applied in a variety of errors-in-variables settings, including linear models, generalized linear models, and Cox survival models. It is furthermore shown in the Supporting Information how SIMSELEX can be applied to spline-based regression models. A simulation study is conducted to compare the SIMSELEX estimators to existing methods in the linear and logistic model settings, and to evaluate performance compared to naive methods in the Cox and spline models. Finally, the method is used to analyze a microarray dataset that contains gene expression measurements of favorable histology Wilms tumors.
- Subjects :
- Statistics and Probability
Generalized linear model
Computer science
Feature selection
Wilms Tumor
01 natural sciences
General Biochemistry, Genetics and Molecular Biology
010104 statistics & probability
03 medical and health sciences
Covariate
Methods
Humans
0101 mathematics
Proportional Hazards Models
030304 developmental biology
0303 health sciences
Models, Statistical
General Immunology and Microbiology
Gene Expression Profiling
Applied Mathematics
Linear model
Estimator
Regression analysis
General Medicine
Microarray Analysis
Logistic Models
Sample size determination
Sample Size
Linear Models
Errors-in-variables models
Scientific Experimental Error
General Agricultural and Biological Sciences
Algorithm
Subjects
Details
- ISSN :
- 15410420 and 0006341X
- Volume :
- 75
- Database :
- OpenAIRE
- Journal :
- Biometrics
- Accession number :
- edsair.doi.dedup.....4fc3f1a06cc1819f405db172ffa235c9