Multimodal Transformer for Nursing Activity Recognition

April 09, 2022 · Declared Dead · 🏛 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Authors Momal Ijaz, Renato Diaz, Chen Chen arXiv ID 2204.04564 Category cs.CV: Computer Vision Cross-listed cs.HC Citations 32 Venue 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Repository https://github.com/Momilijaz96/MMT_for_NCRC} Last Checked 1 month ago

Abstract

In an aging population, elderly patient safety is a primary concern at hospitals and nursing homes, which demands for increased nurse care. By performing nurse activity recognition, we can not only make sure that all patients get an equal desired care, but it can also free nurses from manual documentation of activities they perform, leading to a fair and safe place of care for the elderly. In this work, we present a multimodal transformer-based network, which extracts features from skeletal joints and acceleration data, and fuses them to perform nurse activity recognition. Our method achieves state-of-the-art performance of 81.8% accuracy on the benchmark dataset available for nurse activity recognition from the Nurse Care Activity Recognition Challenge. We perform ablation studies to show that our fusion model is better than single modality transformer variants (using only acceleration or skeleton joints data). Our solution also outperforms state-of-the-art ST-GCN, GRU and other classical hand-crafted-feature-based classifier solutions by a margin of 1.6%, on the NCRC dataset. Code is available at \url{https://github.com/Momilijaz96/MMT_for_NCRC}.