Back to Search Start Over

Sys-TM: A Fast and General Topic Modeling System.

Authors :
Shao, Yingxia
Li, Xupeng
Chen, Yiru
Yu, Lele
Cui, Bin
Source :
IEEE Transactions on Knowledge & Data Engineering. Jun2021, Vol. 33 Issue 6, p2790-2802. 13p.
Publication Year :
2021

Abstract

Topic models, such as LDA and its variants, are popular probabilistic models for discovering the abstract “topics” that occur in a collection of documents. However, the performance of topic models may vary a lot for different workloads, and it is not a trivial task to achieve a well-optimized implementation. In this paper, we systematically study all recently proposed samplers over LDA: AliasLDA, F+LDA, LightLDA, and WarpLDA, and discover a novel system tradeoff by considering the diversity and skewness of workloads. Then, we propose a hybrid sampler which can cleverly choose an efficient sampler with the tradeoff, and apply the hybrid sampler to LDA and its variants, including STM, TOT and CTM. Finally, we build a fast and general topic modeling system Sys-TM, which provides a unified topic modeling framework by integrating the hybrid sampler. Based on our empirical studies, the hybrid sampler outperforms the state-of-the-art samplers by up to 2 × over various topic models, and with carefully engineered implementation, Sys-TM is able to outperform the existing systems by up to 10 × . [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10414347
Volume :
33
Issue :
6
Database :
Academic Search Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
150287532
Full Text :
https://doi.org/10.1109/TKDE.2019.2956518