Back to Search Start Over

Support vector regression for polyhedral and missing data

Authors :
Gianluca Gazzola
Myong K. Jeong
Source :
Annals of Operations Research. 303:483-506
Publication Year :
2020
Publisher :
Springer Science and Business Media LLC, 2020.

Abstract

We introduce “Polyhedral Support Vector Regression” (PSVR), a regression model for data represented by arbitrary convex polyhedral sets. PSVR is derived as a generalization of support vector regression, in which the data is represented by individual points along input variables $$X_1$$ , $$X_2$$ , $$\ldots $$ , $$X_p$$ and output variable Y, and extends a support vector classification model previously introduced for polyhedral data. PSVR is in essence a robust-optimization model, which defines prediction error as the largest deviation, calculated along Y, between an interpolating hyperplane and all points within a convex polyhedron; the model relies on the affine Farkas’ lemma to make this definition computationally tractable within the formulation. As an application, we consider the problem of regression with missing data, where we use convex polyhedra to model the multivariate uncertainty involving the unobserved values in a data set. For this purpose, we discuss a novel technique that builds on multiple imputation and principal component analysis to estimate convex polyhedra from missing data, and on a geometric characterization of such polyhedra to define observation-specific hyper-parameters in the PSVR model. We show that an appropriate calibration of such hyper-parameters can have a significantly beneficial impact on the model’s performance. Experiments on both synthetic and real-world data illustrate how PSVR performs competitively or better than other benchmark methods, especially on data sets with high degree of missingness.

Details

ISSN :
15729338 and 02545330
Volume :
303
Database :
OpenAIRE
Journal :
Annals of Operations Research
Accession number :
edsair.doi...........5773628c47cab5249cadde597be6eb1a