Back to Search
Start Over
Document Information Extraction and its Evaluation based on Client's Relevance
- Source :
- ICDAR-International Conference on Document Analysis and Recognition-2013, ICDAR-International Conference on Document Analysis and Recognition-2013, Aug 2013, Washington DC, United States. ⟨10.1109/ICDAR.2013.16⟩, ICDAR
- Publication Year :
- 2013
- Publisher :
- HAL CCSD, 2013.
-
Abstract
- International audience; In this paper, we present a model-based document information content extraction approach and perform in-depth evaluation based on clients' relevance. Real-world users i.e., clients first provide a set of key fields from the document image which they think are important. These are used to represent a graph where nodes (i.e., fields) are labelled with dynamic semantics including other features and edges are attributed with spatial relations. Such an attributed relational graph (ARG) is then used to mine similar graphs from a document image that are used to reinforce or update the initial graph iteratively each time we extract them, in order to produce a model. Models therefore, can be employed in the absence of clients. We have validated the concept and evaluated its scientific impact on real-world industrial problem, where table extraction is found to be the best suited application.
- Subjects :
- Information retrieval
Graph database
Wait-for graph
Computer science
[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV]
020207 software engineering
Graph theory
02 engineering and technology
Document clustering
computer.software_genre
Graph
Spatial relation
Information extraction
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Data mining
computer
Image retrieval
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- ICDAR-International Conference on Document Analysis and Recognition-2013, ICDAR-International Conference on Document Analysis and Recognition-2013, Aug 2013, Washington DC, United States. ⟨10.1109/ICDAR.2013.16⟩, ICDAR
- Accession number :
- edsair.doi.dedup.....1776451a72be8e63148fee4c2bcbaa09
- Full Text :
- https://doi.org/10.1109/ICDAR.2013.16⟩