Unsupervised Learning Informational Limit in Case of Sparsely Described Examples.

Authors :: Bock, H. -H.
Gaul, W.
Vichi, M.
Arabie, Ph.
Baier, D.
Critchley, F.
Decker, R.
Diday, E.
Greenacre, M.
Lauro, C.
Meulman, J.
Monari, P.
Nishisato, S.
Ohsumi, N.
Opitz, O.
Ritter, G.
Schader, M.
Weihs, C.
Brito, Paula
Cucumel, Guy
Source :: Selected Contributions in Data Analysis & Classification; 2007, p345-355, 11p
Publication Year :: 2007
Abstract: This paper presents a model characterizing unsupervised learning from an information theoretic point of view. Under some hypothesis, it defines a theoretical quality criterion, which corresponds to the informational limit that bounds the learning ability of any clustering algorithm. This quality criterion depends on the information content of the learning set. It is relevant when examples are sparsely described, i.e. when most of the descriptors are missing. This theoretical limit of any unsupervised learning algorithm is then compared to the actual learning quality of different clustering algorithms (EM, COBWEB and PRESS). This empirical comparison is based on the use of artificial data sets, which are randomly degraded. Finally, the paper shows that the results of PRESS, an algorithm specifically designed to learn from sparsely described examples, are very closed to the theoretical upper bound quality. [ABSTRACT FROM AUTHOR]

Full Text Access

Tools