Back to Search Start Over

Amplifying document categorization with advanced features and deep learning.

Authors :
Kavitha, M.
Akila, K.
Source :
Multimedia Tools & Applications; Aug2024, Vol. 83 Issue 26, p68087-68105, 19p
Publication Year :
2024

Abstract

The field of natural language processing (NLP) plays a pivotal role in discerning unstructured data from diverse origins. This study employs advanced techniques rooted in machine learning and deep learning to effectively categorize news articles. Notably, deep learning models have demonstrated superior performance over traditional machine learning algorithms, rendering them a popular choice for a range of NLP tasks. The research employs feature extraction techniques to identify multiword tokens, negation words, and out-of-vocabulary words and replace them. Additionally, convolutional neural network models leverage embedding, convolutional layers, and max pooling layers to capture intricate features. For tasks requiring an understanding of dependencies among long phrases, long short-term memory models come into play. The evaluation of the proposed model hinges on training it with datasets like AG News, BBC, and 20 Newsgroup, gauging its efficacy. The study delves into the myriad challenges inherent to text classification. These challenges are thoughtfully discussed, shedding light on the intricacies of the process. Furthermore, the research furnishes comprehensive test outcomes for both conventional machine learning and deep learning models. The significance of this proposed model is that it uses a multiword expression lexicon, wordnet synset, and word embedding techniques for feature extraction. The performance of the models is increased when using these feature extraction techniques. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13807501
Volume :
83
Issue :
26
Database :
Complementary Index
Journal :
Multimedia Tools & Applications
Publication Type :
Academic Journal
Accession number :
178530042
Full Text :
https://doi.org/10.1007/s11042-024-18483-7