EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation

April 12, 2025 Β· Declared Dead Β· πŸ› ACM Multimedia

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Xiangyue Zhang, Jianfang Li, Jiaxu Zhang, Jianqiang Ren, Liefeng Bo, Zhigang Tu arXiv ID 2504.09209 Category cs.GR: Graphics Cross-listed cs.CV, cs.SD Citations 4 Venue ACM Multimedia Last Checked 3 months ago
Abstract
Masked modeling framework has shown promise in co-speech motion generation. However, it struggles to identify semantically significant frames for effective motion masking. In this work, we propose a speech-queried attention-based mask modeling framework for co-speech motion generation. Our key insight is to leverage motion-aligned speech features to guide the masked motion modeling process, selectively masking rhythm-related and semantically expressive motion frames. Specifically, we first propose a motion-audio alignment module (MAM) to construct a latent motion-audio joint space. In this space, both low-level and high-level speech features are projected, enabling motion-aligned speech representation using learnable speech queries. Then, a speech-queried attention mechanism (SQA) is introduced to compute frame-level attention scores through interactions between motion keys and speech queries, guiding selective masking toward motion frames with high attention scores. Finally, the motion-aligned speech features are also injected into the generation network to facilitate co-speech motion generation. Qualitative and quantitative evaluations confirm that our method outperforms existing state-of-the-art approaches, successfully producing high-quality co-speech motion.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Graphics

R.I.P. πŸ‘» Ghosted

Everybody Dance Now

Caroline Chan, Shiry Ginosar, ... (+2 more)

cs.GR πŸ› ICCV πŸ“š 820 cites 7 years ago
R.I.P. πŸ‘» Ghosted

Animating Human Athletics

Jessica K. Hodgins, Wayne L. Wooten, ... (+2 more)

cs.GR πŸ› SIGGRAPH πŸ“š 765 cites 3 years ago

Died the same way β€” πŸ‘» Ghosted