Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition
December 03, 2019 · Declared Dead · 🏛 IEEE International Conference on Acoustics, Speech, and Signal Processing
"Paper promises code 'coming soon'"
Evidence collected by the PWNC Scanner
Authors
Shaoshi Ling, Yuzong Liu, Julian Salazar, Katrin Kirchhoff
arXiv ID
1912.01679
Category
eess.AS: Audio & Speech
Cross-listed
cs.CL,
cs.LG,
cs.SD
Citations
145
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Last Checked
1 month ago
Abstract
We propose a novel approach to semi-supervised automatic speech recognition (ASR). We first exploit a large amount of unlabeled audio data via representation learning, where we reconstruct a temporal slice of filterbank features from past and future context frames. The resulting deep contextualized acoustic representations (DeCoAR) are then used to train a CTC-based end-to-end ASR system using a smaller amount of labeled audio data. In our experiments, we show that systems trained on DeCoAR consistently outperform ones trained on conventional filterbank features, giving 42% and 19% relative improvement over the baseline on WSJ eval92 and LibriSpeech test-clean, respectively. Our approach can drastically reduce the amount of labeled data required; unsupervised training on LibriSpeech then supervision with 100 hours of labeled data achieves performance on par with training on all 960 hours directly. Pre-trained models and code will be released online.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
📜 Similar Papers
In the same crypt — Audio & Speech
R.I.P.
👻
Ghosted
R.I.P.
👻
Ghosted
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
R.I.P.
👻
Ghosted
DiffWave: A Versatile Diffusion Model for Audio Synthesis
R.I.P.
👻
Ghosted
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
R.I.P.
👻
Ghosted
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
R.I.P.
👻
Ghosted
Generalized End-to-End Loss for Speaker Verification
Died the same way — ⏳ Coming Soon™
R.I.P.
⏳
Coming Soon™
Exploring Simple Siamese Representation Learning
R.I.P.
⏳
Coming Soon™
An Analysis of Scale Invariance in Object Detection - SNIP
R.I.P.
⏳
Coming Soon™
Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection
R.I.P.
⏳
Coming Soon™