1. Embedded functional dependencies and data-completeness tailored database design
- Author
-
Sebastian Link and Ziheng Wei
- Subjects
Interpretation (logic) ,Theoretical computer science ,Computer science ,General Engineering ,Database schema ,02 engineering and technology ,Third normal form ,Boyce–Codd normal form ,Missing data ,Database design ,020204 information systems ,Schema (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Redundancy (engineering) ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Completeness (statistics) ,Functional dependency ,Information Systems - Abstract
We establish a principled schema design framework for data with missing values. The framework is based on the new notion of an embedded functional dependency, which is independent of the interpretation of missing values, able to express completeness and integrity requirements on application data, and capable of capturing redundant data value occurrences that may cause problems with processing data that meets the requirements. We establish axiomatic, algorithmic, and logical foundations for reasoning about embedded functional dependencies. These foundations enable us to introduce generalizations of Boyce-Codd and Third normal forms that avoid processing difficulties of any application data, or minimize these difficulties across dependency-preserving decompositions, respectively. We show how to transform any given schema into application schemata that meet given completeness and integrity requirements, and the conditions of the generalized normal forms. Data over those application schemata are therefore fit for purpose by design. Extensive experiments with benchmark schemata and data illustrate the effectiveness of our framework for the acquisition of the constraints, the schema design process, and the performance of the schema designs in terms of updates and join queries.
- Published
- 2019
- Full Text
- View/download PDF