Back to Search Start Over

Mixture Model and MDSDCA for Textual Data

Authors :
Mohamed Nadif
Benoît Otjacques
Faryel Allouti
Le Thi Hoai An
LIPADE, Paris Descartes University and Sorbonne Paris Cite University, France
Laboratoire d'Informatique Théorique et Appliquée (LITA)
Université de Lorraine (UL)
Centre de Recherche Public Gabriel Lippmann, Department of Informatics (CRPGL)
Centre de Recherche Public - Gabriel Lippmann (LUXEMBOURG)
Source :
Cooperative Design, Visualization, and Engineering. Lecture Notes in Computer Science, Cooperative Design, Visualization, and Engineering. Lecture Notes in Computer Science, 5738, pp.240-244, 2009, ⟨10.1007/978-3-642-04265-2_35⟩, Lecture Notes in Computer Science ISBN: 9783642042645, CDVE
Publication Year :
2009
Publisher :
HAL CCSD, 2009.

Abstract

E-mailing has become an essential component of cooperation in business. Consequently, the large number of messages manually produced or automatically generated can rapidly cause information overflow for users. Many research projects have examined this issue but surprisingly few have tackled the problem of the files attached to e-mails that, in many cases, contain a substantial part of the semantics of the message. This paper considers this specific topic and focuses on the problem of clustering and visualization of attached files. Relying on the multinomial mixture model, we used the Classification EM algorithm (CEM) to cluster the set of files, and MDSDCA to visualize the obtained classes of documents. Like the Multidimensional Scaling method, the aim of the MDSDCA algorithm based on the Difference of Convex functions is to optimize the stress criterion. As MDSDCA is iterative, we propose an initialization approach to avoid starting with random values. Experiments are investigated using simulations and textual data.

Details

Language :
English
ISBN :
978-3-642-04264-5
ISBNs :
9783642042645
Database :
OpenAIRE
Journal :
Cooperative Design, Visualization, and Engineering. Lecture Notes in Computer Science, Cooperative Design, Visualization, and Engineering. Lecture Notes in Computer Science, 5738, pp.240-244, 2009, ⟨10.1007/978-3-642-04265-2_35⟩, Lecture Notes in Computer Science ISBN: 9783642042645, CDVE
Accession number :
edsair.doi.dedup.....b3191dce3c4ab0597af07abcfd6f47c9
Full Text :
https://doi.org/10.1007/978-3-642-04265-2_35⟩