Real-Time Target Sound Extraction

November 04, 2022 · Entered Twilight · 🏛 IEEE International Conference on Acoustics, Speech, and Signal Processing

"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, .ruby-version, CHANGELOG.md, CONTRIBUTING.md, LICENSE, README.md, bower.json, dist, gulpfile.js, package.json, src, test

Authors Bandhav Veluri, Justin Chan, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota arXiv ID 2211.02250 Category cs.SD: Sound Cross-listed cs.LG, eess.AS Citations 44 Venue IEEE International Conference on Acoustics, Speech, and Signal Processing Repository https://github.com/danlevan/google-material-color ⭐ 165 Last Checked 6 days ago

Abstract

We present the first neural network model to achieve real-time and streaming target sound extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder. This hybrid architecture uses dilated causal convolutions for processing large receptive fields in a computationally efficient manner while also leveraging the generalization performance of transformer-based architectures. Our evaluations show as much as 2.2-3.3 dB improvement in SI-SNRi compared to the prior models for this task while having a 1.2-4x smaller model size and a 1.5-2x lower runtime. We provide code, dataset, and audio samples: https://waveformer.cs.washington.edu/.