Multilingual Speech Recognition With A Single End-To-End Model

November 06, 2017 · Declared Dead · 🏛 IEEE International Conference on Acoustics, Speech, and Signal Processing

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao arXiv ID 1711.01694 Category eess.AS: Audio & Speech Cross-listed cs.AI, cs.CL Citations 283 Venue IEEE International Conference on Acoustics, Speech, and Signal Processing Last Checked 1 month ago

Abstract

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input feature, we further improve performance by an additional 7% relative and eliminate confusion between different languages.