R.I.P.
๐ป
Ghosted
Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval
February 07, 2022 ยท Entered Twilight ยท ๐ The Web Conference
Repo contents: .DS_Store, README.md, base, compress_embed.py, configs, data, data_loader, figs, get_embed.py, logs, model, parse_config.py, requirements.txt, train.py, trainer, utils
Authors
Jinpeng Wang, Bin Chen, Dongliang Liao, Ziyun Zeng, Gongfu Li, Shu-Tao Xia, Jin Xu
arXiv ID
2202.03384
Category
cs.IR: Information Retrieval
Cross-listed
cs.CV,
cs.LG,
cs.MM,
cs.SI
Citations
10
Venue
The Web Conference
Repository
https://github.com/gimpong/WWW22-HCQ
โญ 17
Last Checked
1 month ago
Abstract
With the recent boom of video-based social platforms (e.g., YouTube and TikTok), video retrieval using sentence queries has become an important demand and attracts increasing research attention. Despite the decent performance, existing text-video retrieval models in vision and language communities are impractical for large-scale Web search because they adopt brute-force search based on high-dimensional embeddings. To improve efficiency, Web search engines widely apply vector compression libraries (e.g., FAISS) to post-process the learned embeddings. Unfortunately, separate compression from feature encoding degrades the robustness of representations and incurs performance decay. To pursue a better balance between performance and efficiency, we propose the first quantized representation learning method for cross-view video retrieval, namely Hybrid Contrastive Quantization (HCQ). Specifically, HCQ learns both coarse-grained and fine-grained quantizations with transformers, which provide complementary understandings for texts and videos and preserve comprehensive semantic information. By performing Asymmetric-Quantized Contrastive Learning (AQ-CL) across views, HCQ aligns texts and videos at coarse-grained and multiple fine-grained levels. This hybrid-grained learning strategy serves as strong supervision on the cross-view video quantization model, where contrastive learning at different levels can be mutually promoted. Extensive experiments on three Web video benchmark datasets demonstrate that HCQ achieves competitive performance with state-of-the-art non-compressed retrieval methods while showing high efficiency in storage and computation. Code and configurations are available at https://github.com/gimpong/WWW22-HCQ.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Information Retrieval
R.I.P.
๐ป
Ghosted
LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation
R.I.P.
๐ป
Ghosted
Graph Convolutional Neural Networks for Web-Scale Recommender Systems
๐
๐
Old Age
Neural Graph Collaborative Filtering
R.I.P.
๐ป
Ghosted
Self-Attentive Sequential Recommendation
R.I.P.
๐ป
Ghosted