| 51 |
Learning latent representations for style control and transfer in end-to-end speech synthesis
Ya-Jie Zhang, Shifeng Pan, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
242 |
7 years ago |
| 52 |
APE-GAN: Adversarial Perturbation Elimination with GAN
Shiwei Shen, Guoqing Jin, ... (+2 more)
|
👻
Ghosted
|
cs.CV
|
242 |
8 years ago |
| 53 |
Deep Multimodal Learning for Audio-Visual Speech Recognition
Youssef Mroueh, Etienne Marcheret, Vaibhava Goel
|
👻
Ghosted
|
cs.CL
|
242 |
11 years ago |
| 54 |
CN-CELEB: a challenging Chinese speaker recognition dataset
Yue Fan, Jiawen Kang, ... (+8 more)
|
👻
Ghosted
|
eess.AS
|
241 |
6 years ago |
| 55 |
Towards end-to-end spoken language understanding
Dmitriy Serdyuk, Yongqiang Wang, ... (+4 more)
|
👻
Ghosted
|
cs.CL
|
241 |
8 years ago |
| 56 |
The PyTorch-Kaldi Speech Recognition Toolkit
Mirco Ravanelli, Titouan Parcollet, Yoshua Bengio
|
👻
Ghosted
|
eess.AS
|
236 |
7 years ago |
| 57 |
Fully Supervised Speaker Diarization
Aonan Zhang, Quan Wang, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
229 |
7 years ago |
| 58 |
A Hardware Architecture for Reconfigurable Intelligent Surfaces with Minimal Active Elements for Explicit Channel Estimation
George C. Alexandropoulos, Evangelos Vlachos
|
👻
Ghosted
|
cs.IT
|
225 |
6 years ago |
| 59 |
Very Deep Multilingual Convolutional Neural Networks for LVCSR
Tom Sercu, Christian Puhrsch, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
225 |
10 years ago |
| 60 |
Lipreading with Long Short-Term Memory
Michael Wand, Jan Koutník, Jürgen Schmidhuber
|
👻
Ghosted
|
cs.CV
|
223 |
10 years ago |
| 61 |
Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition
Chris Donahue, Bo Li, Rohit Prabhavalkar
|
👻
Ghosted
|
cs.SD
|
220 |
8 years ago |
| 62 |
Fooling End-to-end Speaker Verification by Adversarial Examples
Felix Kreuk, Yossi Adi, ... (+2 more)
|
👻
Ghosted
|
cs.LG
|
218 |
8 years ago |
| 63 |
Batch Normalized Recurrent Neural Networks
César Laurent, Gabriel Pereyra, ... (+3 more)
|
👻
Ghosted
|
stat.ML
|
218 |
10 years ago |
| 64 |
Image Restoration using Total Variation Regularized Deep Image Prior
Jiaming Liu, Yu Sun, ... (+2 more)
|
👻
Ghosted
|
cs.CV
|
211 |
7 years ago |
| 65 |
End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation
Yi Luo, Zhuo Chen, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
206 |
6 years ago |
| 66 |
End-to-End Automatic Speech Translation of Audiobooks
Alexandre Bérard, Laurent Besacier, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
206 |
8 years ago |
| 67 |
Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates
Javier Iranzo-Sánchez, Joan Albert Silvestre-Cerdà, ... (+6 more)
|
👻
Ghosted
|
cs.CL
|
204 |
6 years ago |
| 68 |
Towards Building the Federated GPT: Federated Instruction Tuning
Jianyi Zhang, Saeed Vahidian, ... (+7 more)
|
💤
Eternal Rest
|
cs.CL
|
203 |
2 years ago |
| 69 |
Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
Xie Chen, Yu Wu, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
203 |
5 years ago |
| 70 |
Personalized Speech recognition on mobile devices
Ian McGraw, Rohit Prabhavalkar, ... (+9 more)
|
👻
Ghosted
|
cs.CL
|
198 |
10 years ago |
| 71 |
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition
Yangyang Shi, Yongqiang Wang, ... (+6 more)
|
👻
Ghosted
|
cs.SD
|
195 |
5 years ago |
| 72 |
Replay and Synthetic Speech Detection with Res2net Architecture
Xu Li, Na Li, ... (+5 more)
|
👻
Ghosted
|
eess.AS
|
192 |
5 years ago |
| 73 |
Temporal Coding in Spiking Neural Networks with Alpha Synaptic Function: Learning with Backpropagation
Iulia M. Comsa, Krzysztof Potempa, ... (+4 more)
|
🌅
Old Age
|
cs.NE
|
190 |
6 years ago |
| 74 |
Bias Mitigation Post-processing for Individual and Group Fairness
Pranay K. Lohia, Karthikeyan Natesan Ramamurthy, ... (+4 more)
|
👻
Ghosted
|
cs.LG
|
187 |
7 years ago |
| 75 |
Generative Pre-Training for Speech with Autoregressive Predictive Coding
Yu-An Chung, James Glass
|
🌅
Old Age
|
eess.AS
|
182 |
6 years ago |
| 76 |
Age-Based Scheduling Policy for Federated Learning in Mobile Edge Networks
Howard H. Yang, Ahmed Arafa, ... (+2 more)
|
👻
Ghosted
|
cs.IT
|
182 |
6 years ago |
| 77 |
Convolutional-Recurrent Neural Networks for Speech Enhancement
Han Zhao, Shuayb Zarar, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
182 |
7 years ago |
| 78 |
A Graph-CNN for 3D Point Cloud Classification
Yingxue Zhang, Michael Rabbat
|
👻
Ghosted
|
cs.CV
|
178 |
7 years ago |
| 79 |
Deep convolutional acoustic word embeddings using word-pair side information
Herman Kamper, Weiran Wang, Karen Livescu
|
👻
Ghosted
|
cs.CL
|
176 |
10 years ago |
| 80 |
Attention-Based Models for Text-Dependent Speaker Verification
F A Rezaur Rahman Chowdhury, Quan Wang, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
174 |
8 years ago |
| 81 |
A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition
Albert Zeyer, Patrick Doetsch, ... (+3 more)
|
👻
Ghosted
|
cs.NE
|
172 |
9 years ago |
| 82 |
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation
Ye Jia, Melvin Johnson, ... (+7 more)
|
👻
Ghosted
|
cs.CL
|
171 |
7 years ago |
| 83 |
Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models
Rohit Prabhavalkar, Tara N. Sainath, ... (+5 more)
|
👻
Ghosted
|
cs.CL
|
171 |
8 years ago |
| 84 |
Deep Clustering and Conventional Networks for Music Separation: Stronger Together
Yi Luo, Zhuo Chen, ... (+3 more)
|
👻
Ghosted
|
stat.ML
|
170 |
9 years ago |
| 85 |
PromptTTS: Controllable Text-to-Speech with Text Descriptions
Zhifang Guo, Yichong Leng, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
167 |
3 years ago |
| 86 |
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
Joseph Roth, Sourish Chaudhuri, ... (+9 more)
|
👻
Ghosted
|
cs.CV
|
165 |
7 years ago |
| 87 |
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Rafael Valle, Jason Li, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
162 |
6 years ago |
| 88 |
wav2letter++: The Fastest Open-source Speech Recognition System
Vineel Pratap, Awni Hannun, ... (+6 more)
|
👻
Ghosted
|
cs.CL
|
161 |
7 years ago |
| 89 |
RoIMix: Proposal-Fusion among Multiple Images for Underwater Object Detection
Wei-Hong Lin, Jia-Xing Zhong, ... (+3 more)
|
👻
Ghosted
|
cs.CV
|
160 |
6 years ago |
| 90 |
Ordered Reliability Bits Guessing Random Additive Noise Decoding
Ken R. Duffy
|
👻
Ghosted
|
cs.IT
|
156 |
6 years ago |
| 91 |
Robust and fine-grained prosody control of end-to-end speech synthesis
Younggun Lee, Taesu Kim
|
👻
Ghosted
|
cs.CL
|
155 |
7 years ago |
| 92 |
SpecAugment on Large Scale Datasets
Daniel S. Park, Yu Zhang, ... (+6 more)
|
👻
Ghosted
|
eess.AS
|
154 |
6 years ago |
| 93 |
Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network
Sharath Adavanne, Pasi Pertilä, Tuomas Virtanen
|
👻
Ghosted
|
cs.SD
|
153 |
8 years ago |
| 94 |
Building competitive direct acoustics-to-word models for English conversational speech recognition
Kartik Audhkhasi, Brian Kingsbury, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
152 |
8 years ago |
| 95 |
A spelling correction model for end-to-end speech recognition
Jinxi Guo, Tara N. Sainath, Ron J. Weiss
|
👻
Ghosted
|
eess.AS
|
151 |
7 years ago |
| 96 |
ASR is all you need: cross-modal distillation for lip reading
Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman
|
👻
Ghosted
|
cs.CV
|
149 |
6 years ago |
| 97 |
Trainable Frontend For Robust and Far-Field Keyword Spotting
Yuxuan Wang, Pascal Getreuer, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
147 |
9 years ago |
| 98 |
CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
Linhao Dong, Bo Xu
|
👻
Ghosted
|
cs.CL
|
146 |
6 years ago |
| 99 |
A Co-Interactive Transformer for Joint Slot Filling and Intent Detection
Libo Qin, Tailu Liu, ... (+4 more)
|
👻
Ghosted
|
cs.CL
|
145 |
5 years ago |
| 100 |
Continuous Speech Separation with Conformer
Sanyuan Chen, Yu Wu, ... (+7 more)
|
👻
Ghosted
|
eess.AS
|
145 |
5 years ago |