Improving Topic Models with Latent Feature Word Representations
October 15, 2018 ยท Entered Twilight ยท ๐ Transactions of the Association for Computational Linguistics
"Last commit was 8.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: License.txt, README.md, build.xml, jar, lib, src, test
Authors
Dat Quoc Nguyen, Richard Billingsley, Lan Du, Mark Johnson
arXiv ID
1810.06306
Category
cs.CL: Computation & Language
Cross-listed
cs.IR,
cs.LG
Citations
354
Venue
Transactions of the Association for Computational Linguistics
Repository
https://github.com/datquocnguyen/LFTM
โญ 179
Last Checked
1 month ago
Abstract
Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted