1. Discriminative Feature Spamming Technique for Roman Urdu Sentiment Analysis
- Author
-
Khawar Mehmood, Daryl Essam, Kamran Shafi, and Muhammad Kamran Malik
- Subjects
Natural languages ,natural language processing ,sentiment analysis ,Roman Urdu ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Term weighting is one of the most commonly used approaches, which works by assigning weights to terms, that aims to improve the performance of information retrieval or text categorization tasks. In this paper, we present a novel term weighting technique, called discriminative feature spamming technique (DFST), which identifies distinctive terms, based on a term utility criteria (TUC), and then spams them to increase their discriminative power. The experimental results show that the DFST outperformed a set of time-tested term weighting schemes, from the information retrieval field. All the experiments were performed on the largest ever Roman Urdu (RU) dataset of 11000 reviews, which was collected and annotated for this work. In addition, a custom tokenizer was built, which further improved classification accuracy. A cross-scheme comparison was performed, which showed that the results obtained by using the newly proposed DFST, were statistically significant and better than previous approaches.
- Published
- 2019
- Full Text
- View/download PDF