Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model

April 23, 2026 ยท Grace Period ยท ๐Ÿ› NeurIPS 2025

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Runheng Liu, Heyan Huang, Xingchen Xiao, Zhijing Wu arXiv ID 2604.21223 Category cs.CL: Computation & Language Cross-listed cs.AI Citations 0 Venue NeurIPS 2025
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities across various tasks. However, their ability to generate human-like text has raised concerns about potential misuse. This underscores the need for reliable and effective methods to detect LLM-generated text. In this paper, we propose IRM, a novel zero-shot approach that leverages Implicit Reward Models for LLM-generated text detection. Such implicit reward models can be derived from publicly available instruction-tuned and base models. Previous reward-based method relies on preference construction and task-specific fine-tuning. In comparison, IRM requires neither preference collection nor additional training. We evaluate IRM on the DetectRL benchmark and demonstrate that IRM can achieve superior detection performance, outperforms existing zero-shot and supervised methods in LLM-generated text detection.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago