Learning Cross-lingual Embeddings from Twitter via Distant Supervision

May 17, 2019 Β· Declared Dead Β· πŸ› arXiv.org

πŸ“œ CAUSE OF DEATH: Death by README
Repo has only a README

Repo contents: README.md

Authors Jose Camacho-Collados, Yerai Doval, Eugenio Martínez-CÑmara, Luis Espinosa-Anke, Francesco Barbieri, Steven Schockaert arXiv ID 1905.07358 Category cs.CL: Computation & Language Cross-listed cs.SI Citations 7 Venue arXiv.org Repository https://github.com/pedrada88/crossembeddings-twitter ⭐ 14 Last Checked 1 month ago
Abstract
Cross-lingual embeddings represent the meaning of words from different languages in the same vector space. Recent work has shown that it is possible to construct such representations by aligning independently learned monolingual embedding spaces, and that accurate alignments can be obtained even without external bilingual data. In this paper we explore a research direction that has been surprisingly neglected in the literature: leveraging noisy user-generated text to learn cross-lingual embeddings particularly tailored towards social media applications. While the noisiness and informal nature of the social media genre poses additional challenges to cross-lingual embedding methods, we find that it also provides key opportunities due to the abundance of code-switching and the existence of a shared vocabulary of emoji and named entities. Our contribution consists of a very simple post-processing step that exploits these phenomena to significantly improve the performance of state-of-the-art alignment methods.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Computation & Language

πŸŒ… πŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL πŸ› NeurIPS πŸ“š 166.0K cites 8 years ago

Died the same way β€” πŸ“œ Death by README