Start Over

Data-Driven Variable Decomposition for Treatment Effect Estimation.

Authors :: Kuang, Kun
Cui, Peng
Zou, Hao
Li, Bo
Tao, Jianrong
Wu, Fei
Yang, Shiqiang
Source :: IEEE Transactions on Knowledge & Data Engineering; May2022, Vol. 34 Issue 5, p2120-2134, 15p
Publication Year :: 2022
Abstract: Causal Inference plays an important role in decision making in many fields, such as social marketing, healthcare, and public policy. One fundamental problem in causal inference is the treatment effect estimation in observational studies when variables are confounded. Controlling for confounding effects is generally handled by propensity score. But it treats all observed variables as confounders and ignores the adjustment variables, which have no influence on treatment but are predictive of the outcome. Recently, it has been demonstrated that the adjustment variables are effective in reducing the variance of the estimated treatment effect. However, how to automatically separate the confounders and adjustment variables in observational studies is still an open problem, especially in the scenarios of high dimensional variables, which are common in the big data era. In this paper, we first propose a Data-Driven Variable Decomposition (D $^2$ 2 VD) algorithm, which can 1) automatically separate confounders and adjustment variables with a data-driven approach, and 2) simultaneously estimate treatment effect in observational studies with high dimensional variables. Under standard assumptions, we theoretically prove that our D $^2$ 2 VD algorithm can unbiased estimate treatment effect and achieve lower variance than traditional propensity score based methods. Moreover, to address the challenges from high-dimensional variables and nonlinear, we extend our D $^2$ 2 VD to a non-linear version, namely Nonlinear-D $^2$ 2 VD (N-D $^2$ 2 VD) algorithm. To validate the effectiveness of our proposed algorithms, we conduct extensive experiments on both synthetic and real-world datasets. The experimental results demonstrate that our D $^2$ 2 VD and N-D $^2$ 2 VD algorithms can automatically separate the variables precisely, and estimate treatment effect more accurately and with tighter confidence intervals than the state-of-the-art methods. We also demonstrated that the top-ranked features by our algorithm have the best prediction performance on an online advertising dataset. [ABSTRACT FROM AUTHOR]