Incorporating information extraction in the relational database model

Authors :: Liat Peterfreund
Yoav Nahshon
Stijn Vansummeren
Source :: WebDB
Publication Year :: 2016
Publisher :: ACM, 2016.
Abstract: Modern information extraction pipelines are typically constructed by (1) loading textual data from a database into a special-purpose application, (2) applying a myriad of text-analytics functions to the text, which produce a structured relational table, and (3) storing this table in a database. Obviously, this approach can lead to laborious development processes, complex and tangled programs, and inefficient control flows. Towards solving these deficiencies, we embark on an effort to lay the foundations of a new generation of text-centric database management systems. Concretely, we extend the relational model by incorporating into it the theory of document spanners which provides the means and methods for the model to engage the Information Extraction (IE) tasks. This extended model, called Spannerlog, provides a novel declarative method for defining and manipulating textual data, which makes possible the automation of the typical work method described above. In addition to formally defining Spannerlog and illustrating its usefulness for IE tasks, we also report on initial results concerning its expressive power.

Subjects :: Information retrieval
Relational database
Programming language
Computer science
0102 computer and information sciences
02 engineering and technology
computer.software_genre
01 natural sciences
Database design
Information extraction
Relational calculus
010201 computation theory & mathematics
020204 information systems
Entity–relationship model
0202 electrical engineering, electronic engineering, information engineering
Relational model
Table (database)
computer
Database model

Database :: OpenAIRE
Journal :: Proceedings of the 19th International Workshop on Web and Databases
Accession number :: edsair.doi...........6ba40d44dd563205ebd5e1b0d7db77f1
Full Text :: https://doi.org/10.1145/2932194.2932200