1. OvNMTF Algorithm: an Overlapping Non-Negative Matrix Tri-Factorization for Coclustering
- Author
-
Lucas Fernandes Brunialti, Waldyr Lourenco de Freitas, Sarajane Marques Peres, and Valdinei Freire
- Subjects
Computer science ,05 social sciences ,02 engineering and technology ,Row and column spaces ,Column (database) ,Matrix decomposition ,Matrix (mathematics) ,Factorization ,0202 electrical engineering, electronic engineering, information engineering ,Cluster (physics) ,020201 artificial intelligence & image processing ,0509 other social sciences ,050904 information & library sciences ,Cluster analysis ,Row ,Algorithm - Abstract
Coclustering algorithms are an alternative to classic one-sided clustering algorithms. Because of its ability to simultaneously cluster rows and columns of a dyadic data matrix, coclustering offers a higher value-added information: it offers column clusters besides row clusters, and the relationship between them in terms of coclusters. Different structures of coclusters are possible, and those that overlap in terms of rows or columns still represent an open question with room for improvements. In addition, while most related literature cites coclustering as a means of producing better results from one-side clustering, few initiatives study it as a tool capable of providing higher quality descriptive information about this clustering. In this paper, we present a new coclustering algorithm - OvNMTF, based on triple matrix factorization, which properly handle overlapped coclusters, by adding degrees of freedom for matrix factorization that enable the discovery of specialized column clusters for each row cluster. As a proof of concept, we modeled text analysis as a coclustering problem with column overlaps, assuming that given words (data matrix columns) are associated with over one document cluster (row cluster) because they can assume different semantic relationships in each association. Experiments on synthetic data sets show the OvNMTF algorithm reasonableness; experiments on real-world text data show its power for extracting high quality information.
- Published
- 2020
- Full Text
- View/download PDF