Subword Semantic Hashing for Intent Classification on Small Datasets

October 16, 2018 ยท Entered Twilight ยท ๐Ÿ› IEEE International Joint Conference on Neural Network

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 6.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, .ipynb_checkpoints, README.md, comprehensive_semhash_test.py, datasets, hashembedding, plots, results.csv, semhash_pipeline.ipynb, table, train_test_split.py

Authors Kumar Shridhar, Ayushman Dash, Amit Sahu, Gustav Grund Pihlgren, Pedro Alonso, Vinaychandran Pondenkandath, Gyorgy Kovacs, Foteini Simistira, Marcus Liwicki arXiv ID 1810.07150 Category cs.CL: Computation & Language Citations 32 Venue IEEE International Joint Conference on Neural Network Repository https://github.com/kumar-shridhar/Know-Your-Intent โญ 134 Last Checked 1 month ago
Abstract
In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task for data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise by the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently. Second, users frequently make spelling errors. Typically, the models for intent classification are not trained with spelling errors and it is difficult to think about ways in which users will make mistakes. Models depending on a word vocabulary will always face such issues. An ideal classifier should handle spelling errors inherently. With Semantic Hashing, we overcome these challenges and achieve state-of-the-art results on three datasets: AskUbuntu, Chatbot, and Web Application. Our benchmarks are available online: https://github.com/kumar-shridhar/Know-Your-Intent
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago