Zero-Shot Goal Recognition with Large Language Models

May 14, 2026 · Grace Period · 🏛 NeurIPS 2026

Authors Kin Max Piamolini Gusmão, Nathan Gavenski, Nir Oren, Felipe Meneguzzi arXiv ID 2605.15333 Category cs.AI: Artificial Intelligence Citations 0 Venue NeurIPS 2026

Abstract

Large language models have recently reached near-parity with classical planners on well-known planning domains, yet this competence relies on world-knowledge exploitation rather than genuine symbolic reasoning. Goal recognition is a complementary abductive task structurally better suited to LLM strengths: it consists of evaluating consistency with world knowledge rather than generating novel action sequences. This paper provides the first systematic zero-shot evaluation of frontier LLMs as goal recognisers on key classical PDDL benchmarks. Our results show that LLM competence on goal recognition is uneven: some models scale with evidence and approach landmark-based accuracy at full observations, while others remain anchored to world-knowledge priors regardless of how much evidence accumulates. Qualitative analysis of model reasoning traces reveals that this divergence reflects a fundamental difference in evidence integration rather than domain familiarity. These findings position goal recognition as a principled benchmark for the foundational planning knowledge of LLMs.