Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions

June 23, 2025 ยท Declared Dead ยท ๐Ÿ› The Visual Computer

๐Ÿฆด CAUSE OF DEATH: Skeleton Repo
Boilerplate only, no real code

Repo contents: LICENSE, README.md, SECURITY.md

Authors Vineet Kumar Rakesh, Soumya Mazumdar, Research Pratim Maity, Sarbajit Pal, Amitabha Das, Tapas Samanta arXiv ID 2507.02900 Category cs.CV: Computer Vision Cross-listed cs.AI, cs.GR, cs.HC, cs.MM Citations 2 Venue The Visual Computer Repository https://github.com/VineetKumarRakesh/thg โญ 11 Last Checked 1 month ago
Abstract
Talking Head Generation (THG) has emerged as a transformative technology in computer vision, enabling the synthesis of realistic human faces synchronized with image, audio, text, or video inputs. This paper provides a comprehensive review of methodologies and frameworks for talking head generation, categorizing approaches into 2D--based, 3D--based, Neural Radiance Fields (NeRF)--based, diffusion--based, parameter-driven techniques and many other techniques. It evaluates algorithms, datasets, and evaluation metrics while highlighting advancements in perceptual realism and technical efficiency critical for applications such as digital avatars, video dubbing, ultra-low bitrate video conferencing, and online education. The study identifies challenges such as reliance on pre--trained models, extreme pose handling, multilingual synthesis, and temporal consistency. Future directions include modular architectures, multilingual datasets, hybrid models blending pre--trained and task-specific layers, and innovative loss functions. By synthesizing existing research and exploring emerging trends, this paper aims to provide actionable insights for researchers and practitioners in the field of talking head generation. For the complete survey, code, and curated resource list, visit our GitHub repository: https://github.com/VineetKumarRakesh/thg.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision

Died the same way โ€” ๐Ÿฆด Skeleton Repo

R.I.P. ๐Ÿฆด Skeleton Repo

Neural Style Transfer: A Review

Yongcheng Jing, Yezhou Yang, ... (+4 more)

cs.CV ๐Ÿ› IEEE TVCG ๐Ÿ“š 828 cites 8 years ago