1. Accounting for Complex Sample Designs in Analyses of the Survey of Consumer Finances
- Author
-
Su Hyun Shin and Sherman D. Hanna
- Subjects
0301 basic medicine ,education.field_of_study ,030109 nutrition & dietetics ,Sociology and Political Science ,05 social sciences ,Population ,Logit ,Regression analysis ,Logistic regression ,03 medical and health sciences ,Standard error ,0502 economics and business ,Ordinary least squares ,Statistics ,Econometrics ,050211 marketing ,education ,Psychology ,General Economics, Econometrics and Finance ,Unit-weighted regression ,Statistical hypothesis testing - Abstract
We examined the effects of using bootstrap weights to account for the complex sample design in analyses of Survey of Consumer Finances (SCF) datasets. No article published in this journal that has used the SCF has mentioned the issue of complex sample designs. We compared results obtained without weights and with application of population and bootstrap weights in a logistic regression, and found no substantial differences between the unweighted and the weighted analyses. We also compared results for an ordinary least squares regression, and found few differences between unweighted and weighted models. Unweighted regressions produced more conservative significance tests than the counterpart, and some econometricians have uggested that unweighted analyses are better for hypothesis testing. If estimation of the magnitudes of effects is important, weighted regression may be better because it produces consistent estimators. Researchers should be cautious in drawing conclusions when weighted and unweighted effects are substantially different. Lindamood, Hanna, and Bi (2007) reviewed articles that used the Survey of Consumer Finances (SCF) datasets and appeared in the Journal of Consumer Affairs. They examined the papers for lack of transparency with respect to a number of methodological issues, including weighting of multivariate analyses and the use of the Repeated Implicate Inference (RII) method. However, no articles in this journal that have used SCF datasets have included a discussion on the issue of complex sample designs (Nielsen and Seay 2014; Nielsen et al. 2009). What is the effect on standard error estimates and hypotheses testing when complex sample designs are ignored? Would a consideration of complex sample designs have changed the major conclusions of analyses of SCF datasets? For our comparisons, we used the logit model of Yuh and Hanna (2010), who analyzed a combination of the 1995-2004 SCF datasets, and employed a normative economic framework to create their hypotheses. We also used an ordinary least squares (OLS) model similar to the Yuh and Hanna logit analyses. We used the 2010 SCF dataset, and slightly modified some of their specifications. We focused on providing comparisons between an unweighted multivariate analysis of a SCF dataset, one that applies population weights only, and one that uses both population and bootstrap replicate weights. We suggest guidelines for how to choose between unweighted and weighted models. BRIEF LITERATURE REVIEW The issue of weighting of multivariate analyses has been controversial. Winship and Radbill (1994) stated that the decision to use sampling weights in regression analysis is complicated, and unweighted regression is preferred if the sampling weights are a function of the independent variables. Deaton (1997, 66) mentioned "the old and still controversial issue of whether survey weights should be used in regression." He stated that the answer should be based on the purpose of the regression, and that "the strongest argument for weighted regression comes from those who regard regression as descriptive, not structural" (Deaton 1997, 71). He suggested that researchers adopt the approach taken by DuMouchel and Duncan (1983) to compare weighted and unweighted estimates (Deaton 1997, 72). Lindamood et al. (2007) compared unweighted versus population-weighted analyses of an SCF dataset for three different logistic regressions. They reported that of 99 coefficient estimates for the three logits, nine had a difference between unweighted and weighted in terms of whether the statistical significance level was less than .05 (nine were significant in the weighted model but not significant in the unweighted model). Only one of the 99 coefficients had a statistically significant effect in the unweighted estimate and a nonsignificant effect in the weighted estimate. They cited Deaton's (1997, 66-73) discussion that weighted regression analysis is suspect for hypothesis testing on datasets with endogenous weights, and recommended that if hypothesis testing is the main research focus, unweighted regression analysis should be used for SCF datasets. …
- Published
- 2016
- Full Text
- View/download PDF