R.I.P.
๐ป
Ghosted
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks
November 09, 2023 ยท Entered Twilight ยท ๐ Neural Information Processing Systems
Repo contents: DG-SCT, README.md, data, few-shot, figs, pretrain, requirements.txt, zero-shot
Authors
Haoyi Duan, Yan Xia, Mingze Zhou, Li Tang, Jieming Zhu, Zhou Zhao
arXiv ID
2311.05152
Category
cs.LG: Machine Learning
Cross-listed
cs.AI,
cs.CV,
cs.MM
Citations
43
Venue
Neural Information Processing Systems
Repository
https://github.com/haoyi-duan/DG-SCT
โญ 68
Last Checked
1 month ago
Abstract
In recent years, the deployment of large-scale pre-trained models in audio-visual downstream tasks has yielded remarkable outcomes. However, these models, primarily trained on single-modality unconstrained datasets, still encounter challenges in feature extraction for multi-modal tasks, leading to suboptimal performance. This limitation arises due to the introduction of irrelevant modality-specific information during encoding, which adversely affects the performance of downstream tasks. To address this challenge, this paper proposes a novel Dual-Guided Spatial-Channel-Temporal (DG-SCT) attention mechanism. This mechanism leverages audio and visual modalities as soft prompts to dynamically adjust the parameters of pre-trained models based on the current multi-modal input features. Specifically, the DG-SCT module incorporates trainable cross-modal interaction layers into pre-trained audio-visual encoders, allowing adaptive extraction of crucial information from the current modality across spatial, channel, and temporal dimensions, while preserving the frozen parameters of large-scale pre-trained models. Experimental evaluations demonstrate that our proposed model achieves state-of-the-art results across multiple downstream tasks, including AVE, AVVP, AVS, and AVQA. Furthermore, our model exhibits promising performance in challenging few-shot and zero-shot scenarios. The source code and pre-trained models are available at https://github.com/haoyi-duan/DG-SCT.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted