| 301 |
MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning
Ruize Xu, Ruoxuan Feng, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
48 |
3 years ago |
| 302 |
Learning Online Alignments with Continuous Rewards Policy Gradient
Yuping Luo, Chung-Cheng Chiu, ... (+2 more)
|
👻
Ghosted
|
cs.LG
|
47 |
9 years ago |
| 303 |
A Coupled Compressive Sensing Scheme for Unsourced Multiple Access
Vamsi K. Amalladinne, Avinash Vem, ... (+3 more)
|
👻
Ghosted
|
cs.IT
|
47 |
8 years ago |
| 304 |
Dense Multimodal Fusion for Hierarchically Joint Representation
Di Hu, Feiping Nie, Xuelong Li
|
👻
Ghosted
|
cs.CV
|
47 |
7 years ago |
| 305 |
Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model
Alexander H. Liu, Hung-yi Lee, Lin-shan Lee
|
👻
Ghosted
|
cs.CL
|
47 |
7 years ago |
| 306 |
Class-conditional embeddings for music source separation
Prem Seetharaman, Gordon Wichern, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
47 |
7 years ago |
| 307 |
Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments
Ethan Manilow, Prem Seetharaman, Bryan Pardo
|
👻
Ghosted
|
eess.AS
|
47 |
6 years ago |
| 308 |
PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network
Chengqi Deng, Chengzhu Yu, ... (+3 more)
|
👻
Ghosted
|
cs.SD
|
47 |
6 years ago |
| 309 |
BBAND Index: A No-Reference Banding Artifact Predictor
Zhengzhong Tu, Jessie Lin, ... (+3 more)
|
👻
Ghosted
|
eess.IV
|
47 |
6 years ago |
| 310 |
Efficient Arabic emotion recognition using deep neural networks
Ahmed Ali, Yasser Hifny
|
👻
Ghosted
|
cs.CL
|
47 |
5 years ago |
| 311 |
BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers
Eunjung Han, Chul Lee, Andreas Stolcke
|
👻
Ghosted
|
cs.SD
|
47 |
5 years ago |
| 312 |
FAPM: Fast Adaptive Patch Memory for Real-time Industrial Anomaly Detection
Donghyeong Kim, Chaewon Park, ... (+2 more)
|
👻
Ghosted
|
cs.CV
|
47 |
3 years ago |
| 313 |
Image denoising via group sparsity residual constraint
Zhiyuan Zha, Xin Liu, ... (+8 more)
|
👻
Ghosted
|
cs.CV
|
46 |
9 years ago |
| 314 |
Deep Multi-view Models for Glitch Classification
Sara Bahaadini, Neda Rohani, ... (+4 more)
|
👻
Ghosted
|
cs.LG
|
46 |
9 years ago |
| 315 |
Representation Mixing for TTS Synthesis
Kyle Kastner, João Felipe Santos, ... (+2 more)
|
👻
Ghosted
|
cs.LG
|
46 |
7 years ago |
| 316 |
Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised Discrete Speech Representations
Wen-Chin Huang, Yi-Chiao Wu, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
46 |
5 years ago |
| 317 |
Untargeted Backdoor Attack against Object Detection
Chengxiao Luo, Yiming Li, ... (+2 more)
|
👻
Ghosted
|
cs.CV
|
46 |
3 years ago |
| 318 |
A Foundation Model for Music Informatics
Minz Won, Yun-Ning Hung, Duc Le
|
👻
Ghosted
|
cs.SD
|
46 |
2 years ago |
| 319 |
StemGen: A music generation model that listens
Julian D. Parker, Janne Spijkervet, ... (+7 more)
|
👻
Ghosted
|
cs.SD
|
46 |
2 years ago |
| 320 |
Temporally Aligned Audio for Video with Autoregression
Ilpo Viertola, Vladimir Iashin, Esa Rahtu
|
👻
Ghosted
|
cs.CV
|
46 |
1 year ago |
| 321 |
Decoding visemes: improving machine lipreading
Helen L. Bear, Richard Harvey
|
👻
Ghosted
|
cs.CV
|
45 |
8 years ago |
| 322 |
End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
|
👻
Ghosted
|
cs.CL
|
45 |
7 years ago |
| 323 |
A Recurrent Graph Neural Network for Multi-Relational Data
Vassilis N. Ioannidis, Antonio G. Marques, Georgios B. Giannakis
|
👻
Ghosted
|
cs.LG
|
45 |
7 years ago |
| 324 |
Similarity Learning for Authorship Verification in Social Media
Benedikt Boenninghoff, Robert M. Nickel, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
45 |
6 years ago |
| 325 |
End-to-End Speaker Diarization as Post-Processing
Shota Horiguchi, Paola Garcia, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
45 |
5 years ago |
| 326 |
Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech
Cheol Jun Cho, Peter Wu, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
45 |
3 years ago |
| 327 |
Learning From Yourself: A Self-Distillation Method for Fake Speech Detection
Jun Xue, Cunhang Fan, ... (+5 more)
|
👻
Ghosted
|
cs.SD
|
45 |
3 years ago |
| 328 |
AMC-Net: An Effective Network for Automatic Modulation Classification
Jiawei Zhang, Tiantian Wang, ... (+2 more)
|
👻
Ghosted
|
eess.SP
|
45 |
3 years ago |
| 329 |
Deep Multimodal Learning for Emotion Recognition in Spoken Language
Yue Gu, Shuhong Chen, Ivan Marsic
|
👻
Ghosted
|
cs.CL
|
44 |
8 years ago |
| 330 |
Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition
Wei-Ning Hsu, James Glass
|
👻
Ghosted
|
cs.CL
|
44 |
8 years ago |
| 331 |
Towards Unsupervised Speech-to-Text Translation
Yu-An Chung, Wei-Hung Weng, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
44 |
7 years ago |
| 332 |
Transfer learning of language-independent end-to-end ASR with language model fusion
Hirofumi Inaguma, Jaejin Cho, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
44 |
7 years ago |
| 333 |
Deep geometric knowledge distillation with graphs
Carlos Lassance, Myriam Bontonou, ... (+4 more)
|
👻
Ghosted
|
cs.LG
|
44 |
6 years ago |
| 334 |
Dynamic Sparsity Neural Networks for Automatic Speech Recognition
Zhaofeng Wu, Ding Zhao, ... (+4 more)
|
👻
Ghosted
|
eess.AS
|
44 |
6 years ago |
| 335 |
Visual Prompting for Adversarial Robustness
Aochuan Chen, Peter Lorenz, ... (+3 more)
|
👻
Ghosted
|
cs.CV
|
44 |
3 years ago |
| 336 |
Egocentric Activity Recognition with Multimodal Fisher Vector
Sibo Song, Ngai-Man Cheung, ... (+3 more)
|
👻
Ghosted
|
cs.MM
|
43 |
10 years ago |
| 337 |
Son of Zorn's Lemma: Targeted Style Transfer Using Instance-aware Semantic Segmentation
Carlos Castillo, Soham De, ... (+4 more)
|
👻
Ghosted
|
cs.CV
|
43 |
9 years ago |
| 338 |
Revisiting the problem of audio-based hit song prediction using convolutional neural networks
Li-Chia Yang, Szu-Yu Chou, ... (+3 more)
|
👻
Ghosted
|
cs.SD
|
43 |
9 years ago |
| 339 |
Visual Features for Context-Aware Speech Recognition
Abhinav Gupta, Yajie Miao, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
43 |
8 years ago |
| 340 |
Dynamic Temporal Alignment of Speech to Lips
Tavi Halperin, Ariel Ephrat, Shmuel Peleg
|
👻
Ghosted
|
cs.CV
|
43 |
7 years ago |
| 341 |
Geometry of Deep Learning for Magnetic Resonance Fingerprinting
Mohammad Golbabaee, Dongdong Chen, ... (+3 more)
|
👻
Ghosted
|
cs.LG
|
43 |
7 years ago |
| 342 |
Contextual Speech Recognition with Difficult Negative Training Examples
Uri Alon, Golan Pundak, Tara N. Sainath
|
👻
Ghosted
|
eess.AS
|
43 |
7 years ago |
| 343 |
To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition
Yossi Adi, Neil Zeghidour, ... (+4 more)
|
👻
Ghosted
|
cs.LG
|
43 |
7 years ago |
| 344 |
Emotional Voice Conversion using Multitask Learning with Text-to-speech
Tae-Ho Kim, Sungjae Cho, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
43 |
6 years ago |
| 345 |
Streaming Simultaneous Speech Translation with Augmented Memory Transformer
Xutai Ma, Yongqiang Wang, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
43 |
5 years ago |
| 346 |
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input
Daisuke Niizumi, Daiki Takeuchi, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
43 |
3 years ago |
| 347 |
SpeechLMScore: Evaluating speech generation using speech language model
Soumi Maiti, Yifan Peng, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
43 |
3 years ago |
| 348 |
Distributed Gradient Descent with Coded Partial Gradient Computations
Emre Ozfatura, Sennur Ulukus, Deniz Gunduz
|
👻
Ghosted
|
cs.LG
|
42 |
7 years ago |
| 349 |
Nose, eyes and ears: Head pose estimation by locating facial keypoints
Aryaman Gupta, Kalpit Thakkar, ... (+2 more)
|
👻
Ghosted
|
cs.CV
|
42 |
7 years ago |
| 350 |
SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition
Zhen Huang, Tim Ng, ... (+4 more)
|
👻
Ghosted
|
cs.LG
|
42 |
6 years ago |