Start Over

Rigorous non-disjoint discretization for naive Bayes.

Authors :: Zhang, Huan
Jiang, Liangxiao
Webb, Geoffrey I.
Source :: Pattern Recognition. Aug2023, Vol. 140, pN.PAG-N.PAG. 1p.
Publication Year :: 2023
Abstract: • Discretization is commonly used in naive Bayes. • Few approaches consider the effect of multiple occurrences of a single value. • We propose Rigorous Non-Disjoint Discretization (RNDD) in this paper. • Extensive experimental results validate the effectiveness of RNDD. Naive Bayes is a classical machine learning algorithm for which discretization is commonly used to transform quantitative attributes into qualitative attributes. Of numerous discretization methods, Non-Disjoint Discretization (NDD) proposes a novel perspective by forming overlapping intervals and always locating a value toward the middle of an interval. However, existing approaches to NDD fail to adequately consider the effect of multiple occurrences of a single value — a commonly occurring circumstance in practice. By necessity, all occurrences of a single value fall within the same interval. As a result, it is often not possible to discretize an attribute into intervals containing equal numbers of training instances. Current methods address this issue in an ad hoc manner, reducing the specificity of the resulting atomic intervals. In this study, we propose a non-disjoint discretization method for NB, called Rigorous Non-Disjoint Discretization (RNDD), that handles multiple occurrences of a single value in a systematic manner. Our extensive experimental results suggest that RNDD significantly outperforms NDD along with all other existing state-of-the-art competitors. [ABSTRACT FROM AUTHOR]