On Efficient Approximate Aggregate Nearest Neighbor Queries over Learned Representations

February 26, 2025 · Declared Dead · 🏛 Proc. ACM Manag. Data, Vol. 4, No. 1 (SIGMOD), Article 58. Publication date: February 2026

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Carrie Wang, Sihem Amer-Yahia, Laks V. S. Lakshmanan, Reynold Cheng arXiv ID 2502.18803 Category cs.DS: Data Structures & Algorithms Cross-listed cs.DB, cs.IR Citations 0 Venue Proc. ACM Manag. Data, Vol. 4, No. 1 (SIGMOD), Article 58. Publication date: February 2026 Last Checked 3 months ago

Abstract

We study Aggregation Queries over Nearest Neighbors (AQNN), which compute aggregates over the learned representations of the neighborhood of a designated query object. For example, a medical professional may be interested in the average heart rate of patients whose representations are similar to that of an insomnia patient. Answering AQNNs accurately and efficiently is challenging due to the high cost of generating high-quality representations (e.g., via a deep learning model trained on human expert annotations) and the different sensitivities of different aggregation functions to neighbor selection errors. We address these challenges by combining high-quality and low-cost representations to approximate the aggregate. We characterize value- and count-sensitive AQNNs and propose the Sampler with Precision-Recall in Target (SPRinT), a query answering framework that works in three steps: (1) sampling, (2) nearest neighbor selection, and (3) aggregation. We further establish theoretical bounds on sample sizes and aggregation errors. Extensive experiments on five datasets from three domains (medical, social media, and e-commerce) demonstrate that SPRinT achieves the lowest aggregation error with minimal computation cost in most cases compared to existing solutions. SPRinT's performance remains stable as dataset size grows, confirming its scalability for large-scale applications requiring both accuracy and efficiency.