1. Applying Tesseract-OCR to detection of image spam mails
- Author
-
Noriaki Yoshiura and Daisuke Yamakawa
- Subjects
Information retrieval ,Network security ,business.industry ,Computer science ,InformationSystems_INFORMATIONSYSTEMSAPPLICATIONS ,Image processing ,Optical character recognition ,computer.software_genre ,Image spam ,Object detection ,World Wide Web ,Forum spam ,ComputingMethodologies_PATTERNRECOGNITION ,Spambot ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Tesseract ,business ,computer - Abstract
This paper applies Tesseract-OCR, optical character recognition software, to image spam mail filters. Tesseract-OCR can be specific to a certain language and this paper makes Tesseract-OCR specific to spam words. This specialization decreases times and CPU power that it takes to check whether images of mails include spam words. This paper examines the ability of the spam mail filter of Tesseract-OCR by experiment.
- Published
- 2012
- Full Text
- View/download PDF