End-To-End Speech Recognition Using A High Rank LSTM-CTC Based Model
March 12, 2019 ยท Entered Twilight ยท ๐ IEEE International Conference on Acoustics, Speech, and Signal Processing
"Last commit was 7.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: .gitignore, LICENSE, README.md, bin, conf, egs, local, nnet, pyKaldiIO, scripts
Authors
Yangyang Shi, Mei-Yuh Hwang, Xin Lei
arXiv ID
1903.05261
Category
cs.CL: Computation & Language
Citations
15
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Repository
https://github.com/mobvoi/lstm_ctc
โญ 38
Last Checked
1 month ago
Abstract
Long Short Term Memory Connectionist Temporal Classification (LSTM-CTC) based end-to-end models are widely used in speech recognition due to its simplicity in training and efficiency in decoding. In conventional LSTM-CTC based models, a bottleneck projection matrix maps the hidden feature vectors obtained from LSTM to softmax output layer. In this paper, we propose to use a high rank projection layer to replace the projection matrix. The output from the high rank projection layer is a weighted combination of vectors that are projected from the hidden feature vectors via different projection matrices and non-linear activation function. The high rank projection layer is able to improve the expressiveness of LSTM-CTC models. The experimental results show that on Wall Street Journal (WSJ) corpus and LibriSpeech data set, the proposed method achieves 4%-6% relative word error rate (WER) reduction over the baseline CTC system. They outperform other published CTC based end-to-end (E2E) models under the condition that no external data or data augmentation is applied. Code has been made available at https://github.com/mobvoi/lstm_ctc.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted