Embedding Words as Distributions with a Bayesian Skip-gram Model
November 29, 2017 Β· Entered Twilight Β· π International Conference on Computational Linguistics
"Last commit was 5.0 years ago (β₯5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: .gitignore, README.md, eval, interfaces, layers, libraries, models, requirements.txt, run_bsg.py
Authors
Arthur BraΕΎinskas, Serhii Havrylov, Ivan Titov
arXiv ID
1711.11027
Category
cs.CL: Computation & Language
Cross-listed
cs.AI,
cs.LG
Citations
39
Venue
International Conference on Computational Linguistics
Repository
https://github.com/ixlan/BSG
β 47
Last Checked
1 month ago
Abstract
We introduce a method for embedding words as probability densities in a low-dimensional space. Rather than assuming that a word embedding is fixed across the entire text collection, as in standard word embedding methods, in our Bayesian model we generate it from a word-specific prior density for each occurrence of a given word. Intuitively, for each word, the prior density encodes the distribution of its potential 'meanings'. These prior densities are conceptually similar to Gaussian embeddings. Interestingly, unlike the Gaussian embeddings, we can also obtain context-specific densities: they encode uncertainty about the sense of a word given its context and correspond to posterior distributions within our model. The context-dependent densities have many potential applications: for example, we show that they can be directly used in the lexical substitution task. We describe an effective estimation method based on the variational autoencoding framework. We also demonstrate that our embeddings achieve competitive results on standard benchmarks.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Computation & Language
π
π
Old Age
π
π
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
π»
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
π»
Ghosted