R.I.P.
๐ป
Ghosted
Memory-augmented Contrastive Learning for Talking Head Generation
February 27, 2023 ยท Entered Twilight ยท ๐ IEEE International Conference on Acoustics, Speech, and Signal Processing
Repo contents: mdn, self_supervised
Authors
Jianrong Wang, Yaxin Zhao, Li Liu, Hongkai Fan, Tianyi Xu, Qi Li, Sen Li
arXiv ID
2302.13469
Category
cs.MM: Multimedia
Citations
7
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Repository
https://github.com/Yaxinzhao97/MACL.git
โญ 5
Last Checked
1 month ago
Abstract
Given one reference facial image and a piece of speech as input, talking head generation aims to synthesize a realistic-looking talking head video. However, generating a lip-synchronized video with natural head movements is challenging. The same speech clip can generate multiple possible lip and head movements, that is, there is no one-to-one mapping relationship between them. To overcome this problem, we propose a Speech Feature Extractor (SFE) based on memory-augmented self-supervised contrastive learning, which introduces the memory module to store multiple different speech mapping results. In addition, we introduce the Mixed Density Networks (MDN) into the landmark regression task to generate multiple predicted facial landmarks. Extensive qualitative and quantitative experiments show that the quality of our facial animation is significantly superior to that of the state-of-the-art (SOTA). The code has been released at https://github.com/Yaxinzhao97/MACL.git.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Multimedia
๐
๐
Old Age
Quality Assessment of In-the-Wild Videos
R.I.P.
๐ป
Ghosted
Viewport-Adaptive Navigable 360-Degree Video Delivery
R.I.P.
๐ป
Ghosted
A Comprehensive Survey on Cross-modal Retrieval
R.I.P.
๐ป
Ghosted
An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges
R.I.P.
๐ป
Ghosted