KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos
March 01, 2019 ยท Entered Twilight ยท ๐ Conference on Empirical Methods in Natural Language Processing
"Last commit was 6.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: .gitignore, Dockerfile, LICENSE, README.md, crawler, requirements.txt, webdemo
Authors
Egor Lakomkin, Sven Magg, Cornelius Weber, Stefan Wermter
arXiv ID
1903.00216
Category
cs.CL: Computation & Language
Cross-listed
cs.LG,
cs.SD,
eess.AS
Citations
20
Venue
Conference on Empirical Methods in Natural Language Processing
Repository
https://github.com/EgorLakomkin/KTSpeechCrawler
โญ 156
Last Checked
1 month ago
Abstract
In this paper, we describe KT-Speech-Crawler: an approach for automatic dataset construction for speech recognition by crawling YouTube videos. We outline several filtering and post-processing steps, which extract samples that can be used for training end-to-end neural speech recognition systems. In our experiments, we demonstrate that a single-core version of the crawler can obtain around 150 hours of transcribed speech within a day, containing an estimated 3.5% word error rate in the transcriptions. Automatically collected samples contain reading and spontaneous speech recorded in various conditions including background noise and music, distant microphone recordings, and a variety of accents and reverberation. When training a deep neural network on speech recognition, we observed around 40\% word error rate reduction on the Wall Street Journal dataset by integrating 200 hours of the collected samples into the training set. The demo (http://emnlp-demo.lakomkin.me/) and the crawler code (https://github.com/EgorLakomkin/KTSpeechCrawler) are publicly available.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted