| 1 |
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Daniel S. Park, William Chan, ... (+5 more)
|
👻
Ghosted
|
eess.AS
|
3.9K |
6 years ago |
| 2 |
Tacotron: Towards End-to-End Speech Synthesis
Yuxuan Wang, RJ Skerry-Ryan, ... (+12 more)
|
👻
Ghosted
|
cs.CL
|
2.0K |
9 years ago |
| 3 |
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, ... (+10 more)
|
👻
Ghosted
|
cs.CL
|
1.7K |
8 years ago |
| 4 |
SEGAN: Speech Enhancement Generative Adversarial Network
Santiago Pascual, Antonio Bonafonte, Joan Serrà
|
👻
Ghosted
|
cs.LG
|
1.3K |
9 years ago |
| 5 |
Unsupervised Cross-lingual Representation Learning for Speech Recognition
Alexis Conneau, Alexei Baevski, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
927 |
5 years ago |
| 6 |
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection
Massimiliano Todisco, Xin Wang, ... (+8 more)
|
👻
Ghosted
|
eess.AS
|
736 |
6 years ago |
| 7 |
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
Jon Barker, Shinji Watanabe, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
714 |
8 years ago |
| 8 |
Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling
Bing Liu, Ian Lane
|
👻
Ghosted
|
cs.CL
|
708 |
9 years ago |
| 9 |
MLS: A Large-Scale Multilingual Dataset for Speech Research
Vineel Pratap, Qiantong Xu, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
696 |
5 years ago |
| 10 |
WHAM!: Extending Speech Separation to Noisy Environments
Gordon Wichern, Joe Antognini, ... (+6 more)
|
👻
Ghosted
|
cs.SD
|
454 |
6 years ago |
| 11 |
Single-Channel Multi-Speaker Separation using Deep Clustering
Yusuf Isik, Jonathan Le Roux, ... (+3 more)
|
👻
Ghosted
|
cs.LG
|
447 |
9 years ago |
| 12 |
Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition
Haşim Sak, Andrew Senior, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
441 |
10 years ago |
| 13 |
An Unsupervised Autoregressive Model for Speech Representation Learning
Yu-An Chung, Wei-Ning Hsu, ... (+2 more)
|
🌅
Old Age
|
cs.CL
|
425 |
6 years ago |
| 14 |
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
Quan Wang, Hannah Muckenhirn, ... (+8 more)
|
👻
Ghosted
|
eess.AS
|
413 |
7 years ago |
| 15 |
A Fully Convolutional Neural Network for Speech Enhancement
Se Rim Park, Jinwon Lee
|
👻
Ghosted
|
cs.LG
|
391 |
9 years ago |
| 16 |
Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
Ying Zhang, Mohammad Pezeshki, ... (+4 more)
|
👻
Ghosted
|
cs.CL
|
383 |
9 years ago |
| 17 |
Towards better decoding and language model integration in sequence to sequence models
Jan Chorowski, Navdeep Jaitly
|
👻
Ghosted
|
cs.NE
|
381 |
9 years ago |
| 18 |
Speech Model Pre-training for End-to-End Spoken Language Understanding
Loren Lugosch, Mirco Ravanelli, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
378 |
6 years ago |
| 19 |
English Conversational Telephone Speech Recognition by Humans and Machines
George Saon, Gakuto Kurata, ... (+10 more)
|
👻
Ghosted
|
cs.CL
|
371 |
9 years ago |
| 20 |
Sequence-to-Sequence Models Can Directly Translate Foreign Speech
Ron J. Weiss, Jan Chorowski, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
363 |
9 years ago |
| 21 |
Combining Residual Networks with LSTMs for Lipreading
Themos Stafylakis, Georgios Tzimiropoulos
|
👻
Ghosted
|
cs.CV
|
335 |
9 years ago |
| 22 |
Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks
Chin-Cheng Hsu, Hsin-Te Hwang, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
325 |
8 years ago |
| 23 |
Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition
Hagen Soltau, Hank Liao, Hasim Sak
|
👻
Ghosted
|
cs.CL
|
316 |
9 years ago |
| 24 |
Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
Takaaki Hori, Shinji Watanabe, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
306 |
8 years ago |
| 25 |
Cold Fusion: Training Seq2Seq Models Together with Language Models
Anuroop Sriram, Heewoo Jun, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
301 |
8 years ago |
| 26 |
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Wei Han, Zhengdong Zhang, ... (+7 more)
|
👻
Ghosted
|
eess.AS
|
298 |
5 years ago |
| 27 |
STC Antispoofing Systems for the ASVspoof2019 Challenge
Galina Lavrentyeva, Sergey Novoselov, ... (+4 more)
|
👻
Ghosted
|
cs.SD
|
286 |
6 years ago |
| 28 |
Improved training of end-to-end attention models for speech recognition
Albert Zeyer, Kazuki Irie, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
279 |
7 years ago |
| 29 |
Jasper: An End-to-End Convolutional Neural Acoustic Model
Jason Li, Vitaly Lavrukhin, ... (+6 more)
|
👻
Ghosted
|
eess.AS
|
278 |
6 years ago |
| 30 |
Direct speech-to-speech translation with a sequence-to-sequence model
Ye Jia, Ron J. Weiss, ... (+5 more)
|
👻
Ghosted
|
cs.CL
|
262 |
6 years ago |
| 31 |
Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics
Thomas Drugman, Abeer Alwan
|
👻
Ghosted
|
cs.SD
|
259 |
6 years ago |
| 32 |
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation
Christoph Lüscher, Eugen Beck, ... (+6 more)
|
👻
Ghosted
|
cs.CL
|
240 |
6 years ago |
| 33 |
Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario
Ivan Medennikov, Maxim Korenevsky, ... (+10 more)
|
👻
Ghosted
|
eess.AS
|
236 |
5 years ago |
| 34 |
Exploring wav2vec 2.0 on speaker verification and language identification
Zhiyun Fan, Meng Li, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
231 |
5 years ago |
| 35 |
Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech
Michael Neumann, Ngoc Thang Vu
|
👻
Ghosted
|
cs.CL
|
228 |
8 years ago |
| 36 |
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors
Shota Horiguchi, Yusuke Fujita, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
223 |
5 years ago |
| 37 |
Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification
Daniel Michelsanti, Zheng-Hua Tan
|
👻
Ghosted
|
eess.AS
|
221 |
8 years ago |
| 38 |
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Yu Zhang, Ron J. Weiss, ... (+7 more)
|
👻
Ghosted
|
cs.CL
|
203 |
6 years ago |
| 39 |
The Second DIHARD Diarization Challenge: Dataset, task, and baselines
Neville Ryant, Kenneth Church, ... (+5 more)
|
👻
Ghosted
|
eess.AS
|
196 |
6 years ago |
| 40 |
Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting
Sercan O. Arik, Markus Kliegl, ... (+6 more)
|
👻
Ghosted
|
cs.CL
|
191 |
9 years ago |
| 41 |
Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder
Yu-An Chung, Chao-Chung Wu, ... (+3 more)
|
👻
Ghosted
|
cs.SD
|
191 |
10 years ago |
| 42 |
Language Modeling with Deep Transformers
Kazuki Irie, Albert Zeyer, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
188 |
6 years ago |
| 43 |
Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM
Szu-Wei Fu, Yu Tsao, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
187 |
7 years ago |
| 44 |
Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech
Yu-An Chung, James Glass
|
👻
Ghosted
|
cs.CL
|
187 |
8 years ago |
| 45 |
Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee
|
👻
Ghosted
|
cs.LG
|
187 |
9 years ago |
| 46 |
Powerset multi-class cross entropy loss for neural speaker diarization
Alexis Plaquet, Hervé Bredin
|
👻
Ghosted
|
cs.SD
|
185 |
2 years ago |
| 47 |
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Zhifu Gao, Shiliang Zhang, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
185 |
3 years ago |
| 48 |
Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
Seungwoo Choi, Seokjun Seo, ... (+6 more)
|
👻
Ghosted
|
cs.SD
|
179 |
6 years ago |
| 49 |
ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks
Cheng-I Lai, Nanxin Chen, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
176 |
6 years ago |
| 50 |
Glottal Closure and Opening Instant Detection from Speech Signals
Thomas Drugman, Thierry Dutoit
|
👻
Ghosted
|
cs.SD
|
170 |
6 years ago |