1. Queries with Arithmetic on Incomplete Databases
- Author
-
Marco Console, Matthias F. J. Hofer, Leonid Libkin, University of Edinburgh, Value from Data (VALDA ), Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), Work partially supported by EPSRC grants M025268, N023056, and S003800. Part of this work was done while, the third author was at IRIF, Université de Paris, and DI-ENS, supported by a grant from the Foundation Sciences Mathématiques de Paris., Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Inria de Paris, École normale supérieure - Paris (ENS-PSL), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL)
- Subjects
Numerical data ,Correctness ,approximate query answers ,asymptotic behavior ,first-order queries ,incomplete information ,measure of certainty ,missing data ,numerical data ,query answering ,Computer science ,Missing data ,0102 computer and information sciences ,computer.software_genre ,01 natural sciences ,Measure (mathematics) ,Set (abstract data type) ,Complete information ,Arithmetic ,Computer Science::Databases ,[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,Approximatequery answers ,Database ,Multiplicative function ,First-order queries ,Query an-swering ,16. Peace & justice ,Asymptotic behavior ,Measure of certainty ,Null (SQL) ,010201 computation theory & mathematics ,Conjunctive query ,computer ,Incomplete information - Abstract
International audience; The standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases,relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies,we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behaviorof volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach.
- Published
- 2020
- Full Text
- View/download PDF