1. Haber Sınıflandırma Sistemlerinde Naive Bayes ve Makine Öğrenmesi Algoritmaları Arasında Performans KarĢılaĢtırması.
- Author
-
VEZĠROĞLU, Merve and BUCAK, Ġhsan Ömür
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *LOGISTIC regression analysis , *ALGORITHMS , *CLASSIFICATION - Abstract
The rapid increase in digital content, particularly in text-based tasks like news classification, has significantly amplified the demand for automated classification methods. At this point, Natural Language Processing (NLP) techniques offer the potential to efficiently generate results from large datasets without human intervention. This study presents a Naive Bayes (NB)-based classification system, developed using Python, aimed at categorizing news headlines. NB algorithms are favored for text classification problems due to their simplicity and fast computation. The dataset used, derived from BBC News headlines, covers diverse categories such as technology, business, sports, entertainment, and politics. The data preprocessing phase included steps such as text cleaning, removing stop words, and converting the text into numerical data using Count Vectorization. This process plays a critical role in ensuring accurate and effective classification. Five different NB variants were examined in this study: Gaussian, Multinomial, Complement, Bernoulli, and Tree-Augmented Naive Bayes (TAN). The results showed that Multinomial NB delivered the best performance with an accuracy rate of 98.53%. Complement NB achieved 98.31%, TAN 98.20%, Bernoulli 96.74%, while Gaussian NB ranged between 91.79% and 92.92%. Additionally, NB algorithms were compared with advanced machine learning algorithms such as Logistic Regression, Random Forest, Linear Support Vector Classifier, and Multi-Layer Perceptron. The Multi-Layer Perceptron stood out with an accuracy rate of 98.31%, while the other algorithms also surpassed 97% accuracy. This study demonstrates that NB algorithms provide a robust, reliable, and effective solution for news classification problems, with the Multinomial and Complement variants showing particularly high accuracy. Future research will aim to further enhance the performance of these algorithms using larger datasets and new approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF