Back to Search Start Over

Document Information Extraction and its Evaluation based on Client's Relevance

Authors :
Abdel Belaïd
K. C. Santosh
Recognition of writing and analysis of documents (READ)
Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
ITESOFT
Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)
Source :
ICDAR-International Conference on Document Analysis and Recognition-2013, ICDAR-International Conference on Document Analysis and Recognition-2013, Aug 2013, Washington DC, United States. ⟨10.1109/ICDAR.2013.16⟩, ICDAR
Publication Year :
2013
Publisher :
HAL CCSD, 2013.

Abstract

International audience; In this paper, we present a model-based document information content extraction approach and perform in-depth evaluation based on clients' relevance. Real-world users i.e., clients first provide a set of key fields from the document image which they think are important. These are used to represent a graph where nodes (i.e., fields) are labelled with dynamic semantics including other features and edges are attributed with spatial relations. Such an attributed relational graph (ARG) is then used to mine similar graphs from a document image that are used to reinforce or update the initial graph iteratively each time we extract them, in order to produce a model. Models therefore, can be employed in the absence of clients. We have validated the concept and evaluated its scientific impact on real-world industrial problem, where table extraction is found to be the best suited application.

Details

Language :
English
Database :
OpenAIRE
Journal :
ICDAR-International Conference on Document Analysis and Recognition-2013, ICDAR-International Conference on Document Analysis and Recognition-2013, Aug 2013, Washington DC, United States. ⟨10.1109/ICDAR.2013.16⟩, ICDAR
Accession number :
edsair.doi.dedup.....1776451a72be8e63148fee4c2bcbaa09
Full Text :
https://doi.org/10.1109/ICDAR.2013.16⟩