R.I.P.
๐ป
Ghosted
ZeroS: Zero-Sum Linear Attention for Efficient Transformers
February 05, 2026 ยท Grace Period ยท ๐ NeurIPS 2025
Authors
Jiecheng Lu, Xu Han, Yan Sun, Viresh Pati, Yubin Kim, Siddhartha Somani, Shihao Yang
arXiv ID
2602.05230
Category
cs.LG: Machine Learning
Cross-listed
cs.AI,
stat.ML
Citations
0
Venue
NeurIPS 2025
Abstract
Linear attention methods offer Transformers $O(N)$ complexity but typically underperform standard softmax attention. We identify two fundamental limitations affecting these approaches: the restriction to convex combinations that only permits additive information blending, and uniform accumulated weight bias that dilutes attention in long contexts. We propose Zero-Sum Linear Attention (ZeroS), which addresses these limitations by removing the constant zero-order term $1/t$ and reweighting the remaining zero-sum softmax residuals. This modification creates mathematically stable weights, enabling both positive and negative values and allowing a single attention layer to perform contrastive operations. While maintaining $O(N)$ complexity, ZeroS theoretically expands the set of representable functions compared to convex combinations. Empirically, it matches or exceeds standard softmax attention across various sequence modeling benchmarks.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted