AlignNet: A Unifying Approach to Audio-Visual Alignment
February 12, 2020 ยท Declared Dead ยท ๐ IEEE Workshop/Winter Conference on Applications of Computer Vision
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Jianren Wang, Zhaoyuan Fang, Hang Zhao
arXiv ID
2002.05070
Category
cs.CV: Computer Vision
Cross-listed
cs.LG,
cs.MM,
cs.SD,
eess.AS
Citations
42
Venue
IEEE Workshop/Winter Conference on Applications of Computer Vision
Last Checked
3 months ago
Abstract
We present AlignNet, a model that synchronizes videos with reference audios under non-uniform and irregular misalignments. AlignNet learns the end-to-end dense correspondence between each frame of a video and an audio. Our method is designed according to simple and well-established principles: attention, pyramidal processing, warping, and affinity function. Together with the model, we release a dancing dataset Dance50 for training and evaluation. Qualitative, quantitative and subjective evaluation results on dance-music alignment and speech-lip alignment demonstrate that our method far outperforms the state-of-the-art methods. Project video and code are available at https://jianrenw.github.io/AlignNet.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted
Rethinking the Inception Architecture for Computer Vision
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted