๐ฎ
๐ฎ
The Ethereal
A Survey on Speech Deepfake Detection
April 22, 2024 ยท The Cartographer ยท ๐ ACM Computing Surveys
"No code URL or promise found in abstract"
"Title-pattern auto-detect: A Survey on Speech Deepfake Detection"
Evidence collected by the PWNC Scanner
Authors
Menglu Li, Yasaman Ahmadiadli, Xiao-Ping Zhang
arXiv ID
2404.13914
Category
cs.SD: Sound
Cross-listed
cs.CR,
cs.MM,
eess.AS
Citations
70
Venue
ACM Computing Surveys
Last Checked
8 days ago
Abstract
The availability of smart devices leads to an exponential increase in multimedia content. However, advancements in deep learning have also enabled the creation of highly sophisticated Deepfake content, including speech Deepfakes, which pose a serious threat by generating realistic voices and spreading misinformation. To combat this, numerous challenges have been organized to advance speech Deepfake detection techniques. In this survey, we systematically analyze more than 200 papers published up to March 2024. We provide a comprehensive review of each component in the detection pipeline, including model architectures, optimization techniques, generalizability, evaluation metrics, performance comparisons, available datasets, and open source availability. For each aspect, we assess recent progress and discuss ongoing challenges. In addition, we explore emerging topics such as partial Deepfake detection, cross-dataset evaluation, and defences against adversarial attacks, while suggesting promising research directions. This survey not only identifies the current state of the art to establish strong baselines for future experiments but also offers clear guidance for researchers aiming to enhance speech Deepfake detection systems.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Sound
R.I.P.
๐ป
Ghosted
Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks
R.I.P.
๐ป
Ghosted
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
R.I.P.
๐ป
Ghosted
TasNet: time-domain audio separation network for real-time, single-channel speech separation
R.I.P.
๐ป
Ghosted
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
R.I.P.
๐ป
Ghosted