| 1 |
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Jonathan Shen, Ruoming Pang, ... (+11 more)
|
👻
Ghosted
|
cs.CL
|
3.0K |
8 years ago |
| 2 |
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey, Sourish Chaudhuri, ... (+11 more)
|
👻
Ghosted
|
cs.SD
|
2.8K |
9 years ago |
| 3 |
Listen, Attend and Spell
William Chan, Navdeep Jaitly, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
2.4K |
10 years ago |
| 4 |
Deep clustering: Discriminative embeddings for segmentation and separation
John R. Hershey, Zhuo Chen, ... (+2 more)
|
👻
Ghosted
|
cs.NE
|
1.4K |
10 years ago |
| 5 |
End-to-End Attention-based Large Vocabulary Speech Recognition
Dzmitry Bahdanau, Jan Chorowski, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
1.2K |
10 years ago |
| 6 |
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
Chung-Cheng Chiu, Tara N. Sainath, ... (+12 more)
|
👻
Ghosted
|
cs.CL
|
1.2K |
8 years ago |
| 7 |
WaveGlow: A Flow-based Generative Network for Speech Synthesis
Ryan Prenger, Rafael Valle, Bryan Catanzaro
|
👻
Ghosted
|
cs.SD
|
1.1K |
7 years ago |
| 8 |
Exposing Deep Fakes Using Inconsistent Head Poses
Xin Yang, Yuezun Li, Siwei Lyu
|
👻
Ghosted
|
cs.CV
|
1.0K |
7 years ago |
| 9 |
Generalized End-to-End Loss for Speaker Verification
Li Wan, Quan Wang, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
1.0K |
8 years ago |
| 10 |
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning
Suyoun Kim, Takaaki Hori, Shinji Watanabe
|
👻
Ghosted
|
cs.CL
|
982 |
9 years ago |
| 11 |
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim
|
👻
Ghosted
|
eess.AS
|
950 |
6 years ago |
| 12 |
Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation
Dong Yu, Morten Kolbæk, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
929 |
9 years ago |
| 13 |
Libri-Light: A Benchmark for ASR with Limited or No Supervision
Jacob Kahn, Morgane Rivière, ... (+13 more)
|
👻
Ghosted
|
cs.CL
|
773 |
6 years ago |
| 14 |
Attention is All You Need in Speech Separation
Cem Subakan, Mirco Ravanelli, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
721 |
5 years ago |
| 15 |
Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos
Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
|
👻
Ghosted
|
cs.CV
|
713 |
7 years ago |
| 16 |
TasNet: time-domain audio separation network for real-time, single-channel speech separation
Yi Luo, Nima Mesgarani
|
👻
Ghosted
|
cs.SD
|
711 |
8 years ago |
| 17 |
Streaming End-to-end Speech Recognition For Mobile Devices
Yanzhang He, Tara N. Sainath, ... (+18 more)
|
👻
Ghosted
|
cs.CL
|
664 |
7 years ago |
| 18 |
End-to-End Text-Dependent Speaker Verification
Georg Heigold, Ignacio Moreno, ... (+2 more)
|
👻
Ghosted
|
cs.LG
|
607 |
10 years ago |
| 19 |
Clotho: An Audio Captioning Dataset
Konstantinos Drossos, Samuel Lipping, Tuomas Virtanen
|
👻
Ghosted
|
cs.SD
|
523 |
6 years ago |
| 20 |
Convolutional Recurrent Neural Networks for Music Classification
Keunwoo Choi, George Fazekas, ... (+2 more)
|
👻
Ghosted
|
cs.NE
|
518 |
9 years ago |
| 21 |
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction
Jean-Marc Valin, Jan Skoglund
|
👻
Ghosted
|
eess.AS
|
489 |
7 years ago |
| 22 |
The Microsoft 2017 Conversational Speech Recognition System
W. Xiong, L. Wu, ... (+4 more)
|
👻
Ghosted
|
cs.CL
|
481 |
8 years ago |
| 23 |
DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors
Chandan K A Reddy, Vishak Gopal, Ross Cutler
|
👻
Ghosted
|
cs.SD
|
461 |
5 years ago |
| 24 |
Deep attractor network for single-microphone speaker separation
Zhuo Chen, Yi Luo, Nima Mesgarani
|
👻
Ghosted
|
cs.SD
|
420 |
9 years ago |
| 25 |
Very Deep Convolutional Networks for End-to-End Speech Recognition
Yu Zhang, William Chan, Navdeep Jaitly
|
👻
Ghosted
|
cs.CL
|
419 |
9 years ago |
| 26 |
Deep Learning for Joint Source-Channel Coding of Text
Nariman Farsad, Milind Rao, Andrea Goldsmith
|
👻
Ghosted
|
cs.IT
|
418 |
8 years ago |
| 27 |
FastPitch: Parallel Text-to-speech with Pitch Prediction
Adrian Łańcucki
|
👻
Ghosted
|
eess.AS
|
394 |
5 years ago |
| 28 |
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
Andy T. Liu, Shu-wen Yang, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
393 |
6 years ago |
| 29 |
Very Deep Convolutional Neural Networks for Raw Waveforms
Wei Dai, Chia Dai, ... (+3 more)
|
👻
Ghosted
|
cs.SD
|
375 |
9 years ago |
| 30 |
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen, Xingjian Du, ... (+4 more)
|
👻
Ghosted
|
cs.SD
|
374 |
4 years ago |
| 31 |
Hierarchical Federated Learning Across Heterogeneous Cellular Networks
Mehdi Salehi Heydar Abad, Emre Ozfatura, ... (+2 more)
|
👻
Ghosted
|
cs.LG
|
366 |
6 years ago |
| 32 |
Utterance-level Aggregation For Speaker Recognition In The Wild
Weidi Xie, Arsha Nagrani, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
365 |
7 years ago |
| 33 |
Speaker Diarization with LSTM
Quan Wang, Carlton Downey, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
341 |
8 years ago |
| 34 |
Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings
Giambattista Parascandolo, Heikki Huttunen, Tuomas Virtanen
|
👻
Ghosted
|
cs.SD
|
334 |
10 years ago |
| 35 |
Capsule Networks for Brain Tumor Classification based on MRI Images and Course Tumor Boundaries
Parnian Afshar, Konstantinos N. Plataniotis, Arash Mohammadi
|
👻
Ghosted
|
cs.CV
|
314 |
7 years ago |
| 36 |
Federated Learning for Keyword Spotting
David Leroy, Alice Coucke, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
310 |
7 years ago |
| 37 |
Highway Long Short-Term Memory RNNs for Distant Speech Recognition
Yu Zhang, Guoguo Chen, ... (+4 more)
|
👻
Ghosted
|
cs.NE
|
295 |
10 years ago |
| 38 |
Beamforming Optimization for Intelligent Reflecting Surface with Discrete Phase Shifts
Qingqing Wu, Rui Zhang
|
👻
Ghosted
|
cs.IT
|
293 |
7 years ago |
| 39 |
Learning to Invert: Signal Recovery via Deep Convolutional Networks
Ali Mousavi, Richard G. Baraniuk
|
👻
Ghosted
|
stat.ML
|
293 |
9 years ago |
| 40 |
The Microsoft 2016 Conversational Speech Recognition System
W. Xiong, J. Droppo, ... (+6 more)
|
👻
Ghosted
|
cs.CL
|
291 |
9 years ago |
| 41 |
Multilingual Speech Recognition With A Single End-To-End Model
Shubham Toshniwal, Tara N. Sainath, ... (+5 more)
|
👻
Ghosted
|
eess.AS
|
283 |
8 years ago |
| 42 |
Compressed Sensing Based Multi-User Millimeter Wave Systems: How Many Measurements Are Needed?
Ahmed Alkhateeb, Geert Leus, Robert W. Heath
|
👻
Ghosted
|
cs.IT
|
281 |
10 years ago |
| 43 |
Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention
Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara
|
👻
Ghosted
|
cs.SD
|
277 |
8 years ago |
| 44 |
Achievable Rate maximization by Passive Intelligent Mirrors
Chongwen Huang, Alessio Zappone, ... (+2 more)
|
👻
Ghosted
|
cs.IT
|
276 |
7 years ago |
| 45 |
An analysis of incorporating an external language model into a sequence-to-sequence model
Anjuli Kannan, Yonghui Wu, ... (+4 more)
|
👻
Ghosted
|
eess.AS
|
275 |
8 years ago |
| 46 |
Yedrouj-Net: An efficient CNN for spatial steganalysis
Mehdi Yedroudj, Frederic Comby, Marc Chaumont
|
👻
Ghosted
|
cs.CV
|
264 |
8 years ago |
| 47 |
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Yongqiang Wang, Abdelrahman Mohamed, ... (+11 more)
|
👻
Ghosted
|
cs.CL
|
259 |
6 years ago |
| 48 |
Self-Training for End-to-End Speech Recognition
Jacob Kahn, Ann Lee, Awni Hannun
|
👻
Ghosted
|
cs.CL
|
254 |
6 years ago |
| 49 |
Deep Residual Learning for Small-Footprint Keyword Spotting
Raphael Tang, Jimmy Lin
|
👻
Ghosted
|
cs.CL
|
253 |
8 years ago |
| 50 |
Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset
Kun Zhou, Berrak Sisman, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
245 |
5 years ago |