Cyclic Vision-Language Manipulator: Towards Reliable and Fine-Grained Image Interpretation for Automated Report Generation

November 08, 2024 · Declared Dead · 🏛 International Joint Conference on Artificial Intelligence

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Yingying Fang, Zihao Jin, Shaojie Guo, Jinda Liu, Zhiling Yue, Yijian Gao, Junzhi Ning, Zhi Li, Simon Walsh, Guang Yang arXiv ID 2411.05261 Category cs.CV: Computer Vision Cross-listed cs.AI, cs.CL, cs.LG Citations 1 Venue International Joint Conference on Artificial Intelligence Last Checked 3 months ago

Abstract

Despite significant advancements in automated report generation, the opaqueness of text interpretability continues to cast doubt on the reliability of the content produced. This paper introduces a novel approach to identify specific image features in X-ray images that influence the outputs of report generation models. Specifically, we propose Cyclic Vision-Language Manipulator CVLM, a module to generate a manipulated X-ray from an original X-ray and its report from a designated report generator. The essence of CVLM is that cycling manipulated X-rays to the report generator produces altered reports aligned with the alterations pre-injected into the reports for X-ray generation, achieving the term "cyclic manipulation". This process allows direct comparison between original and manipulated X-rays, clarifying the critical image features driving changes in reports and enabling model users to assess the reliability of the generated texts. Empirical evaluations demonstrate that CVLM can identify more precise and reliable features compared to existing explanation methods, significantly enhancing the transparency and applicability of AI-generated reports.