1. StuffIE
- Author
-
Werner Nutt, Radityo Eko Prasojo, and Mouna Kacimi
- Subjects
Semantic HTML ,Computer science ,business.industry ,WordNet ,02 engineering and technology ,computer.software_genre ,Set (abstract data type) ,03 medical and health sciences ,Information extraction ,0302 clinical medicine ,Knowledge extraction ,Dependency grammar ,030221 ophthalmology & optometry ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Recent knowledge extraction methods are moving towards ternary and higher-arity relations to capture more information about binary facts. An example is to include the time, the location, and the duration of a specific fact. These relations can be even more complex to extract in advanced domains such as news, where events typically come with different facets including reasons, consequences, purposes, involved parties, and related events. The main challenge consists in first finding the set of facets related to each fact, and second tagging those facets to the relevant category. In this paper, we tackle the above problems by proposing StuffIE, a fine-grained information extraction approach which is facet-centric. We exploit the Stanford dependency parsing enhanced by lexical databases such as WordNet to extract nested triple relations. Then, we exploit the syntactical dependencies to semantically tag facets using distant learning based on Oxford dictionary. We have tested the accuracy of the extracted facets and their semantic tags using DUC'04 dataset. The results show the high accuracy and coverage of our approach with respect to ClausIE, OLLIE, SEMAFOR SRL and Illinois SRL.
- Published
- 2018
- Full Text
- View/download PDF