Back to Search Start Over

CLASSIFICATION AND REGRESSION TREES AND FORESTS FOR INCOMPLETE DATA FROM SAMPLE SURVEYS.

Authors :
Wei-Yin Loh
Eltinge, John
Moon Jung Cho
Yuanzhi Li
Source :
Statistica Sinica; Jan2019, Vol. 29 Issue 1, p431-453, 23p, 1 Chart
Publication Year :
2019

Abstract

Analysis of sample survey data often requires adjustments for missing val- ues in the variables of interest. Standard adjustments based on item imputation or on propensity weighting factors rely on the availability of auxiliary variables for both responding and non-responding units. Their application can be challenging when the auxiliary variables are numerous and are themselves subject to incomplete-data problems. This paper shows how classification and regression trees and forests can overcome these difficulties and compares them with likelihood methods in terms of bias and mean squared error. The development centers on a component of income data from the U.S. Consumer Expenditure Survey, which has a relatively high rate of item missingness. Classification trees and forests are used to model the unit- level propensity for item missingness in the income component. Regression trees and forests are used to model the conditional mean of the income component. The methods are then used to estimate the mean of the income component, adjusted for item nonresponse. Thirteen methods for estimating a population mean are compared in simulation experiments. The results show that if the number of auxiliary variables with missing values is not small, or if they have substantial missingness rates, likelihood methods can be impracticable or inapplicable. Tree and forest methods are always applicable, are relatively fast, and have higher efficiency than likelihood methods under real-data situations with incomplete-data patterns similar to that in the abovementioned survey. Their efficiency loss under parametric conditions most favorable to likelihood methods is observed to be between 10-25%. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10170405
Volume :
29
Issue :
1
Database :
Complementary Index
Journal :
Statistica Sinica
Publication Type :
Academic Journal
Accession number :
142136784
Full Text :
https://doi.org/10.5705/ss.202017.0225