Improving Topic Models with Latent Feature Word Representations

October 15, 2018 ยท Entered Twilight ยท ๐Ÿ› Transactions of the Association for Computational Linguistics

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 8.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: License.txt, README.md, build.xml, jar, lib, src, test

Authors Dat Quoc Nguyen, Richard Billingsley, Lan Du, Mark Johnson arXiv ID 1810.06306 Category cs.CL: Computation & Language Cross-listed cs.IR, cs.LG Citations 354 Venue Transactions of the Association for Computational Linguistics Repository https://github.com/datquocnguyen/LFTM โญ 179 Last Checked 1 month ago
Abstract
Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago