Back to Search
Start Over
Wordification: Propositionalization by unfolding relational data into bags of words
- Source :
- Expert Systems with Applications. 42:6442-6456
- Publication Year :
- 2015
- Publisher :
- Elsevier BV, 2015.
-
Abstract
- We improved wordification methodology and provide a formal framework and pseudo code.We statistically evaluated comparable algorithms on multiple relational databases.Experiments show favorable results in terms of accuracy and efficiency.Feature simplicity is compensated by n-gram construction and by feature weighting.We implemented the full experimental workflow in a data mining platform ClowdFlows. Inductive Logic Programming (ILP) and Relational Data Mining (RDM) address the task of inducing models or patterns from multi-relational data. One of the established approaches to RDM is propositionalization, characterized by transforming a relational database into a single-table representation. This paper presents a propositionalization technique called wordification which can be seen as a transformation of a relational database into a corpus of text documents. Wordification constructs simple, easy to understand features, acting as words in the transformed Bag-Of-Words representation. This paper presents the wordification methodology, together with an experimental comparison of several propositionalization approaches on seven relational datasets. The main advantages of the approach are: simple implementation, accuracy comparable to competitive methods, and greater scalability, as it performs several times faster on all experimental databases. Furthermore, the wordification methodology and the evaluation procedure are implemented as executable workflows in the web-based data mining platform ClowdFlows. The implemented workflows include also several other ILP and RDM algorithms, as well as the utility components that were added to the platform to enable access to these techniques to a wider research audience.
- Subjects :
- Computer science
business.industry
Relational database
Relational data mining
General Engineering
Statistical relational learning
computer.file_format
computer.software_genre
Machine learning
Computer Science Applications
RDM
Text mining
Workflow
Inductive logic programming
Artificial Intelligence
Scalability
Artificial intelligence
Data mining
Executable
business
computer
Subjects
Details
- ISSN :
- 09574174
- Volume :
- 42
- Database :
- OpenAIRE
- Journal :
- Expert Systems with Applications
- Accession number :
- edsair.doi...........11855d7d0a3c4dda631bb61aaafa80bc
- Full Text :
- https://doi.org/10.1016/j.eswa.2015.04.017