1. Text Extraction for Spam-Mail Image Filtering Using a Text Color Estimation Technique.
- Author
-
Carbonell, Jaime G., Siekmann, Jörg, Okuno, Hiroshi G., Ali, Moonis, Ji-Soo Kim, Kim, S. H., Yang, H. J., Son, H. J., and Kim, W. P.
- Abstract
In this paper, we propose an algorithm for extracting text regions from images in spam-mails. The Color Layer-Based Text Extraction(CLTE) algorithm divides the input image into eight planes as color layers. It extracts connected components on the eight planes, and then classifies them into either text regions or non-text. We also propose an algorithm to recover damaged text strokes in Korean text images. There are two types of damaged strokes: (1) middle strokes such as ‘⌉' or ‘—' are deleted, and (2) the first and last strokes such as ‘∘' or ‘□' are filled with black pixels. An experiment with 200 spammail images shows that the proposed approach is more accurate than conventional methods by over 10%. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF