Back to Search
Start Over
Sys-TM: A Fast and General Topic Modeling System.
- Source :
-
IEEE Transactions on Knowledge & Data Engineering . Jun2021, Vol. 33 Issue 6, p2790-2802. 13p. - Publication Year :
- 2021
-
Abstract
- Topic models, such as LDA and its variants, are popular probabilistic models for discovering the abstract “topics” that occur in a collection of documents. However, the performance of topic models may vary a lot for different workloads, and it is not a trivial task to achieve a well-optimized implementation. In this paper, we systematically study all recently proposed samplers over LDA: AliasLDA, F+LDA, LightLDA, and WarpLDA, and discover a novel system tradeoff by considering the diversity and skewness of workloads. Then, we propose a hybrid sampler which can cleverly choose an efficient sampler with the tradeoff, and apply the hybrid sampler to LDA and its variants, including STM, TOT and CTM. Finally, we build a fast and general topic modeling system Sys-TM, which provides a unified topic modeling framework by integrating the hybrid sampler. Based on our empirical studies, the hybrid sampler outperforms the state-of-the-art samplers by up to 2 × over various topic models, and with carefully engineered implementation, Sys-TM is able to outperform the existing systems by up to 10 × . [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 10414347
- Volume :
- 33
- Issue :
- 6
- Database :
- Academic Search Index
- Journal :
- IEEE Transactions on Knowledge & Data Engineering
- Publication Type :
- Academic Journal
- Accession number :
- 150287532
- Full Text :
- https://doi.org/10.1109/TKDE.2019.2956518