Rethinking Evaluation in ASR: Are Our Models Robust Enough?
October 22, 2020 ยท Declared Dead ยท ๐ Interspeech
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Paden Tomasello, Jacob Kahn, Gilad Avidov, Ronan Collobert, Gabriel Synnaeve
arXiv ID
2010.11745
Category
cs.LG: Machine Learning
Cross-listed
cs.CL,
cs.SD,
eess.AS
Citations
106
Venue
Interspeech
Last Checked
3 months ago
Abstract
Is pushing numbers on a single benchmark valuable in automatic speech recognition? Research results in acoustic modeling are typically evaluated based on performance on a single dataset. While the research community has coalesced around various benchmarks, we set out to understand generalization performance in acoustic modeling across datasets - in particular, if models trained on a single dataset transfer to other (possibly out-of-domain) datasets. We show that, in general, reverberative and additive noise augmentation improves generalization performance across domains. Further, we demonstrate that when a large enough set of benchmarks is used, average word error rate (WER) performance over them provides a good proxy for performance on real-world noisy data. Finally, we show that training a single acoustic model on the most widely-used datasets - combined - reaches competitive performance on both research and real-world benchmarks.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
R.I.P.
๐ป
Ghosted
A Unified Approach to Interpreting Model Predictions
R.I.P.
๐ป
Ghosted