X Modality Assisting RGBT Object Tracking
December 27, 2023 ยท Declared Dead ยท ๐ Applied intelligence (Boston)
Repo contents: README.md, gtot.zip, rgbt234.zip
Authors
Zhaisheng Ding, Haiyan Li, Ruichao Hou, Yanyu Liu, Shidong Xie
arXiv ID
2312.17273
Category
cs.CV: Computer Vision
Citations
4
Venue
Applied intelligence (Boston)
Repository
https://github.com/DZSYUNNAN/XNet
Last Checked
1 month ago
Abstract
Developing robust multi-modal feature representations is crucial for enhancing object tracking performance. In pursuit of this objective, a novel X Modality Assisting Network (X-Net) is introduced, which explores the impact of the fusion paradigm by decoupling visual object tracking into three distinct levels, thereby facilitating subsequent processing. Initially, to overcome the challenges associated with feature learning due to significant discrepancies between RGB and thermal modalities, a plug-and-play pixel-level generation module (PGM) based on knowledge distillation learning is proposed. This module effectively generates the X modality, bridging the gap between the two patterns while minimizing noise interference. Subsequently, to optimize sample feature representation and promote cross-modal interactions, a feature-level interaction module (FIM) is introduced, integrating a mixed feature interaction transformer and a spatial dimensional feature translation strategy. Finally, to address random drifting caused by missing instance features, a flexible online optimization strategy called the decision-level refinement module (DRM) is proposed, which incorporates optical flow and refinement mechanisms. The efficacy of X-Net is validated through experiments on three benchmarks, demonstrating its superiority over state-of-the-art trackers. Notably, X-Net achieves performance gains of 0.47%/1.2% in the average of precise rate and success rate, respectively. Additionally, the research content, data, and code are pledged to be made publicly accessible at https://github.com/DZSYUNNAN/XNet.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted
Rethinking the Inception Architecture for Computer Vision
Died the same way โ ๐ฆด Skeleton Repo
R.I.P.
๐ฆด
Skeleton Repo
EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
R.I.P.
๐ฆด
Skeleton Repo
Deep Learning for 3D Point Clouds: A Survey
R.I.P.
๐ฆด
Skeleton Repo
Adversarial Examples: Attacks and Defenses for Deep Learning
R.I.P.
๐ฆด
Skeleton Repo