Back to Search Start Over

Deep truth discovery for pattern-based fact extraction.

Authors :
Ye, Chen
Wang, Hongzhi
Lu, Wenbo
Gao, Jing
Dai, Guojun
Source :
Information Sciences. Nov2021, Vol. 580, p478-494. 17p.
Publication Year :
2021

Abstract

[Display omitted] Fact extraction, which aims to extract (entity, attribute, value)-tuples from massive text corpora, is crucial in the area of text data mining. Recent approaches have focused on extracting facts by mining textual patterns with semantic types, where the quality of a pattern is evaluated based on content-based criteria, such as frequency. However, these approaches overlook the dimension of pattern reliability , which reflects how likely the extracted facts are correct. As a result, a pattern of good content-quality (e.g., high frequency) may still extract incorrect facts. In this study, we consider both pattern reliability and fact trustworthiness in addressing the pattern-based fact extraction problem. To learn the complex relationship between pattern reliability and fact trustworthiness, we propose a novel deep learning model using a hybrid of the CNN and LSTM architecture. For fact embedding, we adopt CNN to extract a fix-sized representation of each component, i.e., entity, attribute, and value, of the fact. For pattern embedding, we represent the pattern as a semantic composition of its extracted fact representations. To de-emphasis the noisy facts, we consider the fact trustworthiness and frequency during the process of pattern embedding, where the features of the tuple trustworthiness information are extracted by a long short-term memory (LSTM) model. To learn the pattern-fact relational dependency, we train the model with both pattern and tuple labels. Extensive experiments involving three real-world datasets demonstrated that the proposed model significantly improves the quality of the patterns and the extracted facts in the pattern-based information extraction. [ABSTRACT FROM AUTHOR]

Subjects

Subjects :
*DEEP learning
*DATA mining

Details

Language :
English
ISSN :
00200255
Volume :
580
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
153291233
Full Text :
https://doi.org/10.1016/j.ins.2021.08.084