Real-Time Target Sound Extraction

November 04, 2022 ยท Entered Twilight ยท ๐Ÿ› IEEE International Conference on Acoustics, Speech, and Signal Processing

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, .ruby-version, CHANGELOG.md, CONTRIBUTING.md, LICENSE, README.md, bower.json, dist, gulpfile.js, package.json, src, test

Authors Bandhav Veluri, Justin Chan, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota arXiv ID 2211.02250 Category cs.SD: Sound Cross-listed cs.LG, eess.AS Citations 44 Venue IEEE International Conference on Acoustics, Speech, and Signal Processing Repository https://github.com/danlevan/google-material-color โญ 165 Last Checked 6 days ago
Abstract
We present the first neural network model to achieve real-time and streaming target sound extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder. This hybrid architecture uses dilated causal convolutions for processing large receptive fields in a computationally efficient manner while also leveraging the generalization performance of transformer-based architectures. Our evaluations show as much as 2.2-3.3 dB improvement in SI-SNRi compared to the prior models for this task while having a 1.2-4x smaller model size and a 1.5-2x lower runtime. We provide code, dataset, and audio samples: https://waveformer.cs.washington.edu/.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Sound