Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer
October 23, 2020 ยท Declared Dead ยท ๐ IEEE International Conference on Acoustics, Speech, and Signal Processing
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jinyu Li
arXiv ID
2010.12180
Category
cs.SD: Sound
Cross-listed
cs.CL,
eess.AS
Citations
34
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Last Checked
2 months ago
Abstract
With its strong modeling capacity that comes from a multi-head and multi-layer structure, Transformer is a very powerful model for learning a sequential representation and has been successfully applied to speech separation recently. However, multi-channel speech separation sometimes does not necessarily need such a heavy structure for all time frames especially when the cross-talker challenge happens only occasionally. For example, in conversation scenarios, most regions contain only a single active speaker, where the separation task downgrades to a single speaker enhancement problem. It turns out that using a very deep network structure for dealing with signals with a low overlap ratio not only negatively affects the inference efficiency but also hurts the separation performance. To deal with this problem, we propose an early exit mechanism, which enables the Transformer model to handle different cases with adaptive depth. Experimental results indicate that not only does the early exit mechanism accelerate the inference, but it also improves the accuracy.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Sound
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
CNN Architectures for Large-Scale Audio Classification
R.I.P.
๐ป
Ghosted
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
R.I.P.
๐ป
Ghosted
WaveGlow: A Flow-based Generative Network for Speech Synthesis
R.I.P.
๐ป
Ghosted
Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted