1. Efficient dictionary-based text rewriting using subsequential transducers?
- Author
-
S. MIHOV and K. U. SCHULZ
- Subjects
WRITING ,LANGUAGE & languages ,ENCYCLOPEDIAS & dictionaries ,TRANSDUCERS ,ELECTRONIC equipment - Abstract
AbstractProblems in the area of text and document processing can often be described as text rewriting tasks: given an input text, produce a new text by applying some fixed set of rewriting rules. In its simplest form, a rewriting rule is given by a pair of strings, representing a source string (the ?original?) and its substitute. By a rewriting dictionary, we mean a finite list of such pairs; dictionary-based text rewriting means to replace in an input text occurrences of originals by their substitutes. We present an efficient method for constructing, given a rewriting dictionary D, a subsequential transducer that accepts any text tas input and outputs the intended rewriting result under the so-called ?leftmost-longest match? replacement with skips, t'. The time needed to compute the transducer is linear in the size of the input dictionary. Given the transducer, any text tof length |t| is rewritten in a deterministic manner in time O(|t|+|t'|), where t' denotes the resulting output text. Hence the resulting rewriting mechanism is very efficient. As a second advantage, using standard tools, the transducer can be directly composed with other transducers to efficiently solve more complex rewriting tasks in a single processing step. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF