DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation

October 11, 2024 ยท Declared Dead ยท ๐Ÿ› ACM Multimedia

๐Ÿ’€ CAUSE OF DEATH: 404 Not Found
Code link is broken/dead
Authors Jia Li, Yangchen Yu, Yin Chen, Yu Zhang, Peng Jia, Yunbo Xu, Ziqiang Li, Meng Wang, Richang Hong arXiv ID 2410.08470 Category cs.HC: Human-Computer Interaction Cross-listed cs.CV Citations 6 Venue ACM Multimedia Repository https://github.com/MSA-LMC/DAT} Last Checked 1 month ago
Abstract
Engagement estimation plays a crucial role in understanding human social behaviors, attracting increasing research interests in fields such as affective computing and human-computer interaction. In this paper, we propose a Dialogue-Aware Transformer framework (DAT) with Modality-Group Fusion (MGF), which relies solely on audio-visual input and is language-independent, for estimating human engagement in conversations. Specifically, our method employs a modality-group fusion strategy that independently fuses audio and visual features within each modality for each person before inferring the entire audio-visual content. This strategy significantly enhances the model's performance and robustness. Additionally, to better estimate the target participant's engagement levels, the introduced Dialogue-Aware Transformer considers both the participant's behavior and cues from their conversational partners. Our method was rigorously tested in the Multi-Domain Engagement Estimation Challenge held by MultiMediate'24, demonstrating notable improvements in engagement-level regression precision over the baseline model. Notably, our approach achieves a CCC score of 0.76 on the NoXi Base test set and an average CCC of 0.64 across the NoXi Base, NoXi-Add, and MPIIGI test sets.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Human-Computer Interaction

Died the same way โ€” ๐Ÿ’€ 404 Not Found