1. Discriminative Feature Spamming Technique for Roman Urdu Sentiment Analysis
- Author
-
Khawar Mehmood, Daryl Essam, Muhammad Kamran Malik, and Kamran Shafi
- Subjects
General Computer Science ,Computer science ,Natural languages ,02 engineering and technology ,computer.software_genre ,Field (computer science) ,Roman Urdu ,Set (abstract data type) ,Discriminative model ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,General Materials Science ,natural language processing ,business.industry ,Sentiment analysis ,General Engineering ,020207 software engineering ,Weighting ,Term (time) ,Spamming ,sentiment analysis ,020201 artificial intelligence & image processing ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Artificial intelligence ,business ,lcsh:TK1-9971 ,computer ,Natural language processing - Abstract
Term weighting is one of the most commonly used approaches, which works by assigning weights to terms, that aims to improve the performance of information retrieval or text categorization tasks. In this paper, we present a novel term weighting technique, called discriminative feature spamming technique (DFST), which identifies distinctive terms, based on a term utility criteria (TUC), and then spams them to increase their discriminative power. The experimental results show that the DFST outperformed a set of time-tested term weighting schemes, from the information retrieval field. All the experiments were performed on the largest ever Roman Urdu (RU) dataset of 11000 reviews, which was collected and annotated for this work. In addition, a custom tokenizer was built, which further improved classification accuracy. A cross-scheme comparison was performed, which showed that the results obtained by using the newly proposed DFST, were statistically significant and better than previous approaches.
- Published
- 2019
- Full Text
- View/download PDF