LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videos

December 17, 2020 · Entered Twilight · 🏛 ACM Multimedia

"Last commit was 5.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitattributes, .gitignore, CAD120, README.md, V-COCO, requirements.txt, teaser.PNG

Authors Sai Praneeth Reddy Sunkesula, Rishabh Dabral, Ganesh Ramakrishnan arXiv ID 2012.09402 Category cs.CV: Computer Vision Citations 37 Venue ACM Multimedia Repository https://github.com/praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-TEmporal-Networks-for-HOI ⭐ 16 Last Checked 1 month ago

Abstract

Analyzing the interactions between humans and objects from a video includes identification of the relationships between humans and the objects present in the video. It can be thought of as a specialized version of Visual Relationship Detection, wherein one of the objects must be a human. While traditional methods formulate the problem as inference on a sequence of video segments, we present a hierarchical approach, LIGHTEN, to learn visual features to effectively capture spatio-temporal cues at multiple granularities in a video. Unlike current approaches, LIGHTEN avoids using ground truth data like depth maps or 3D human pose, thus increasing generalization across non-RGBD datasets as well. Furthermore, we achieve the same using only the visual features, instead of the commonly used hand-crafted spatial features. We achieve state-of-the-art results in human-object interaction detection (88.9% and 92.6%) and anticipation tasks of CAD-120 and competitive results on image based HOI detection in V-COCO dataset, setting a new benchmark for visual features based approaches. Code for LIGHTEN is available at https://github.com/praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-TEmporal-Networks-for-HOI