Optimizing Local-Global Dependencies for Accurate 3D Human Pose Estimation

December 27, 2024 ยท Declared Dead ยท ๐Ÿ› IEEE transactions on circuits and systems for video technology (Print)

๐Ÿ“œ CAUSE OF DEATH: Death by README
Repo has only a README

Repo contents: LICENSE, README.md

Authors Guangsheng Xu, Guoyi Zhang, Lejia Ye, Shuwei Gan, Xiaohu Zhang, Xia Yang arXiv ID 2412.19676 Category cs.CV: Computer Vision Citations 1 Venue IEEE transactions on circuits and systems for video technology (Print) Repository https://github.com/poker-xu/SSR-STF โญ 6 Last Checked 1 month ago
Abstract
Transformer-based methods have recently achieved significant success in 3D human pose estimation, owing to their strong ability to model long-range dependencies. However, relying solely on the global attention mechanism is insufficient for capturing the fine-grained local details, which are crucial for accurate pose estimation. To address this, we propose SSR-STF, a dual-stream model that effectively integrates local features with global dependencies to enhance 3D human pose estimation. Specifically, we introduce SSRFormer, a simple yet effective module that employs the skeleton selective refine attention (SSRA) mechanism to capture fine-grained local dependencies in human pose sequences, complementing the global dependencies modeled by the Transformer. By adaptively fusing these two feature streams, SSR-STF can better learn the underlying structure of human poses, overcoming the limitations of traditional methods in local feature extraction. Extensive experiments on the Human3.6M and MPI-INF-3DHP datasets demonstrate that SSR-STF achieves state-of-the-art performance, with P1 errors of 37.4 mm and 13.2 mm respectively, outperforming existing methods in both accuracy and generalization. Furthermore, the motion representations learned by our model prove effective in downstream tasks such as human mesh recovery. Codes are available at https://github.com/poker-xu/SSR-STF.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision

Died the same way โ€” ๐Ÿ“œ Death by README