CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning

March 17, 2026 ยท Grace Period ยท ๐Ÿ› ICLR 2026 Workshop

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Weikun K. Zhang, Rohan Pandey, Bhaumik Mehta, Kaijie Jin, Naomi Morato, Archit Ganapule, Michael Ruofan Zeng, Jarod Alper arXiv ID 2603.17075 Category cs.LG: Machine Learning Cross-listed cs.AI, cs.CC Citations 0 Venue ICLR 2026 Workshop
Abstract
Motivated by auto-proof generation and Valiant's VP vs. VNP conjecture, we study the problem of discovering efficient arithmetic circuits to compute polynomials, using addition and multiplication gates. We formulate this problem as a single-player game, where an RL agent attempts to build the circuit within a fixed number of operations. We implement an AlphaZero-style training loop and compare two approaches: Proximal Policy Optimization with Monte Carlo Tree Search (PPO+MCTS) and Soft Actor-Critic (SAC). SAC achieves the highest success rates on two-variable targets, while PPO+MCTS scales to three variables and demonstrates steady improvement on harder instances. These results suggest that polynomial circuit synthesis is a compact, verifiable setting for studying self-improving search policies.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning