Reasoning Structure of Large Language Models

June 02, 2026 Β· Grace Period Β· πŸ› ICML 2026 and presented at the ICLR 2026 workshop on LLM reasoning

⏳ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors FrΓ©dΓ©ric Berdoz, Luca A. LanzendΓΆrfer, Fabian Farestam, Roger Wattenhofer arXiv ID 2606.03883 Category cs.AI: Artificial Intelligence Cross-listed cs.LG Citations 0 Venue ICML 2026 and presented at the ICLR 2026 workshop on LLM reasoning
Abstract
Large reasoning models (LRMs) are often evaluated using metrics such as final-answer accuracy or token count. However, identical scores on these metrics can hide fundamentally different reasoning structures. To address this limitation, we introduce a scalable LRM benchmark of logic puzzles and a pipeline that converts unstructured traces into verifiable reasoning graphs of claims and dependencies. This turns reasoning into a structured, measurable object whose topology can be quantitatively analyzed. Building on this, we define a reasoning efficiency metric that quantifies how concentrated the model's logical flow is. Our analysis on open-source reasoning models shows that structural measurements separate behaviors that token count and accuracy conflate, providing a practical tool for diagnosing failure modes and comparing how reasoning scales with puzzle difficulty.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Artificial Intelligence