1. Sample-based collection and adjustment of rules for metadata extraction in business documents.
- Author
-
Matsumoto, Toshiko, Oba, Mitsuharu, Onoyama, Takashi, and Akiyoshi, Masanori
- Subjects
- *
METADATA , *ACQUISITION of data , *RULES , *BUSINESS records management data processing , *ALGORITHMS , *KEYWORDS , *JAPANESE people - Abstract
Toward facile introduction of metadata-based document management systems, we propose an algorithm which uses sample documents and their manually specified metadata as training data, and generates metadata-extraction rules. Our algorithm enumerates candidates of keywords and layout characteristics specific to the metadata on the basis of metadata occurrence in the training data. It then examines whether each candidate is specific to only one kind of metadata. In an experiment on Japanese business documents and weekly reports, automatically generated rules achieved metadata extraction as accurate as manually adjusted ones. © 2012 Wiley Periodicals, Inc. Electron Comm Jpn, 95(6): 1-11, 2012; Published online in Wiley Online Library (). DOI 10.1002/ecj.11373 [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF