Learning ASR-Robust Contextualized Embeddings for Spoken Language Understanding

September 24, 2019 · Entered Twilight · 🏛 IEEE International Conference on Acoustics, Speech, and Signal Processing

"Last commit was 5.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: README.md, data, models, requirements.txt, src

Authors Chao-Wei Huang, Yun-Nung Chen arXiv ID 1909.10861 Category cs.CL: Computation & Language Cross-listed cs.LG, eess.AS Citations 45 Venue IEEE International Conference on Acoustics, Speech, and Signal Processing Repository https://github.com/MiuLab/SpokenVec ⭐ 24 Last Checked 1 month ago

Abstract

Employing pre-trained language models (LM) to extract contextualized word representations has achieved state-of-the-art performance on various NLP tasks. However, applying this technique to noisy transcripts generated by automatic speech recognizer (ASR) is concerned. Therefore, this paper focuses on making contextualized representations more ASR-robust. We propose a novel confusion-aware fine-tuning method to mitigate the impact of ASR errors to pre-trained LMs. Specifically, we fine-tune LMs to produce similar representations for acoustically confusable words that are obtained from word confusion networks (WCNs) produced by ASR. Experiments on the benchmark ATIS dataset show that the proposed method significantly improves the performance of spoken language understanding when performing on ASR transcripts. Our source code is available at https://github.com/MiuLab/SpokenVec