1. Entity Local Structure Graph Matching for Mislabeling Correction
- Author
-
Vincent Poulain d'Andecy, Nihel Kooli, Aurelie Joseph, Abdel Belaïd, Recognition of writing and analysis of documents (READ), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Itesoft R&D, ITESOFT, Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
structure model ,Matching (graph theory) ,business.industry ,Computer science ,Comparison results ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,entity local structure ,Local structure ,Specific cost ,graph clustering ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] ,subgraph matching ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Business documents ,Artificial intelligence ,mislabeling correction ,Structured model ,business ,Structure comparison ,Clustering coefficient - Abstract
International audience; This paper proposes an entity local structure comparison approach based on inexact subgraph matching. The comparison results are used for mislabeling correction in the local structure. The latter represents a set of entity attribute labels which are physically close in a document image. It is modeled by an attributed graph describing the content and presentation features of the labels by the nodes and the geometrical features by the arcs. A local structure graph is matched with a structure model which represents a set of local structure model graphs. The structure model is initially built using a set of well chosen local structures based on a graph clustering algorithm and is then incrementally updated. The subgraph matching adopts a specific cost function that integrates the feature dissimilarities. The matched model graph is used to extract the missed labels, prune the extraneous ones and correct the erroneous label fields in the local structure. The evaluation of the structure comparison approach on 525 local structures extracted from 200 business documents achieves about 90% for recall and 95% for precision. The mislabeling correction rates in these local structures vary between 73% and 100%.
- Published
- 2016
- Full Text
- View/download PDF