Med-MMHL: A Multi-Modal Dataset for Detecting Human- and LLM-Generated Misinformation in the Medical Domain

June 15, 2023 · Declared Dead · 🏛 arXiv.org

Authors Yanshen Sun, Jianfeng He, Shuo Lei, Limeng Cui, Chang-Tien Lu arXiv ID 2306.08871 Category cs.SI: Social & Info Networks Cross-listed cs.CY Citations 21 Venue arXiv.org Repository https://github.com/styxsys0927/Med-MMHL} Last Checked 1 month ago

Abstract

The pervasive influence of misinformation has far-reaching and detrimental effects on both individuals and society. The COVID-19 pandemic has witnessed an alarming surge in the dissemination of medical misinformation. However, existing datasets pertaining to misinformation predominantly focus on textual information, neglecting the inclusion of visual elements, and tend to center solely on COVID-19-related misinformation, overlooking misinformation surrounding other diseases. Furthermore, the potential of Large Language Models (LLMs), such as the ChatGPT developed in late 2022, in generating misinformation has been overlooked in previous works. To overcome these limitations, we present Med-MMHL, a novel multi-modal misinformation detection dataset in a general medical domain encompassing multiple diseases. Med-MMHL not only incorporates human-generated misinformation but also includes misinformation generated by LLMs like ChatGPT. Our dataset aims to facilitate comprehensive research and development of methodologies for detecting misinformation across diverse diseases and various scenarios, including human and LLM-generated misinformation detection at the sentence, document, and multi-modal levels. To access our dataset and code, visit our GitHub repository: \url{https://github.com/styxsys0927/Med-MMHL}.