1. Quote examiner: verifying quoted images using web-based text similarity
- Author
-
Sawinder Kaur, Sneha Banerjee, and Parteek Kumar
- Subjects
Information retrieval ,Computer Networks and Communications ,business.industry ,Computer science ,020207 software engineering ,Cloud computing ,02 engineering and technology ,computer.file_format ,Optical character recognition ,computer.software_genre ,Hardware and Architecture ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Web application ,Formatted text ,Social media ,business ,computer ,Software - Abstract
Over the last few years, there has been a rapid growth in digital data. Images with quotes are spreading virally through online social media platforms. Misquotes found online often spread like a forest fire through social media, which highlights the lack of responsibility of the web users when circulating poorly cited quotes. Thus, it is important to authenticate the content contained in the images being circulated online. So, there is a need to retrieve the information within such textual images to verify quotes before its usage in order to differentiate a fake or misquote from an authentic one. Optical Character Recognition (OCR) is used in this paper, for converting textual images into readable text format, but none of the OCR tools are perfect in extracting information from the images accurately. In this paper, a method of post-processing on the retrieved text to improve the accuracy of the detected text from images has been proposed. Google Cloud Vision has been used for recognizing text from images. It has also been observed that using post-processing on the extracted text improved the accuracy of text recognition by 3.5% approximately. A web-based text similarity approach (URLs and domain name) has been used to examine the authenticity of the content of the quoted images. Approximately, 96.26% accuracy has been achieved in classifying quoted images as verified or misquoted. Also, a ground truth dataset of authentic site names has been created. In this research, images with quotes by famous celebrities and global leaders have been used. A comparative analysis has been performed to show the effectiveness of our proposed algorithm.
- Published
- 2021
- Full Text
- View/download PDF