1. Increasing the usability of graph databases by learning SPARQL queries and RDF data
- Author
-
Caceres, Gonzalo Diaz and Benedikt, Michael
- Subjects
005.74 ,Computer science - Abstract
Semantic Web technologies and other open standards have the potential of allowing current open datasets and knowledge bases to become more interlinked and interoperable, providing interfaces for end-users to access data via powerful high-level query languages such as SPARQL. As a consequence, it becomes ever more important for these data to be efficiently and easily searchable by non-expert audiences. The challenge of increasing the usability of these database systems, therefore, becomes central. In this thesis, we approach the problem of increasing the usability of querying graph data, in the form of RDF knowledge bases, from different perspectives. We study the learning - more specifically, the reverse engineering - of SPARQL queries, including its theoretical and practical aspects. Along the way we find that a key limitation of these approaches is the completeness of the data, and therefore turn to learning - in this case, knowledge base completion - of RDF data. Specifically, we begin by studying the definability problem for first-order logic, providing exact complexity bounds. We next provide a theoretical study of reverse engineering in the SPARQL context, formalising variants of the reverse engineering problem and giving bounds on their complexity. We develop algorithms for reverse engineering and perform experimental analyses, showing that they scale well. Additionally, we implement and present a proof-of-concept user application which demonstrates how reverse engineering is capable of guiding users who are unfamiliar with both the dataset and with SPARQL to desired queries and result sets based on a query by example paradigm. Finally, to address the issue of the completeness of data, we propose a scalable and ontology-aware graph embedding model which allows for fact inference in RDF datasets, providing a data learning approach that is complementary to query learning.
- Published
- 2018