Back to Search Start Over

Efficient dictionary-based text rewriting using subsequential transducers?

Authors :
S. MIHOV
K. U. SCHULZ
Source :
Natural Language Engineering; Dec2007, Vol. 13 Issue 4, p353-381, 29p
Publication Year :
2007

Abstract

AbstractProblems in the area of text and document processing can often be described as text rewriting tasks: given an input text, produce a new text by applying some fixed set of rewriting rules. In its simplest form, a rewriting rule is given by a pair of strings, representing a source string (the ?original?) and its substitute. By a rewriting dictionary, we mean a finite list of such pairs; dictionary-based text rewriting means to replace in an input text occurrences of originals by their substitutes. We present an efficient method for constructing, given a rewriting dictionary D, a subsequential transducer that accepts any text tas input and outputs the intended rewriting result under the so-called ?leftmost-longest match? replacement with skips, t'. The time needed to compute the transducer is linear in the size of the input dictionary. Given the transducer, any text tof length |t| is rewritten in a deterministic manner in time O(|t|+|t'|), where t' denotes the resulting output text. Hence the resulting rewriting mechanism is very efficient. As a second advantage, using standard tools, the transducer can be directly composed with other transducers to efficiently solve more complex rewriting tasks in a single processing step. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13513249
Volume :
13
Issue :
4
Database :
Complementary Index
Journal :
Natural Language Engineering
Publication Type :
Academic Journal
Accession number :
27676463
Full Text :
https://doi.org/10.1017/S1351324905004092