DARLR: Dual-Agent Offline Reinforcement Learning for Recommender Systems with Dynamic Reward

May 12, 2025 ยท Declared Dead ยท ๐Ÿ› Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

๐Ÿ“œ CAUSE OF DEATH: Death by README
Repo has only a README

Repo contents: LICENSE, README.md

Authors Yi Zhang, Ruihong Qiu, Xuwei Xu, Jiajun Liu, Sen Wang arXiv ID 2505.07257 Category cs.IR: Information Retrieval Citations 4 Venue Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Repository https://github.com/ArronDZhang/DARLR โญ 4 Last Checked 1 month ago
Abstract
Model-based offline reinforcement learning (RL) has emerged as a promising approach for recommender systems, enabling effective policy learning by interacting with frozen world models. However, the reward functions in these world models, trained on sparse offline logs, often suffer from inaccuracies. Specifically, existing methods face two major limitations in addressing this challenge: (1) deterministic use of reward functions as static look-up tables, which propagates inaccuracies during policy learning, and (2) static uncertainty designs that fail to effectively capture decision risks and mitigate the impact of these inaccuracies. In this work, a dual-agent framework, DARLR, is proposed to dynamically update world models to enhance recommendation policies. To achieve this, a \textbf{\textit{selector}} is introduced to identify reference users by balancing similarity and diversity so that the \textbf{\textit{recommender}} can aggregate information from these users and iteratively refine reward estimations for dynamic reward shaping. Further, the statistical features of the selected users guide the dynamic adaptation of an uncertainty penalty to better align with evolving recommendation requirements. Extensive experiments on four benchmark datasets demonstrate the superior performance of DARLR, validating its effectiveness. The code is available at https://github.com/ArronDZhang/DARLR.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Information Retrieval

Died the same way โ€” ๐Ÿ“œ Death by README