TUCKET: A Tensor Time Series Data Structure for Efficient and Accurate Factor Analysis over Time Ranges

January 11, 2025 · Declared Dead · 🏛 Proceedings of the VLDB Endowment

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ruizhong Qiu, Jun-Gi Jang, Xiao Lin, Lihui Liu, Hanghang Tong arXiv ID 2501.06647 Category cs.DS: Data Structures & Algorithms Citations 11 Venue Proceedings of the VLDB Endowment Last Checked 4 months ago

Abstract

Tucker decomposition has been widely used in a variety of applications to obtain latent factors of tensor data. In these applications, a common need is to compute Tucker decomposition for a given time range. Furthermore, real-world tensor time series are typically evolving in the time dimension. Such needs call for a data structure that can efficiently and accurately support range queries of Tucker decomposition and stream updates. Unfortunately, existing methods do not support either range queries or stream updates. This challenging problem has remained open for years prior to our work. To solve this challenging problem, we propose TUCKET, a data structure that can efficiently and accurately handle both range queries and stream updates. Our key idea is to design a new data structure that we call a stream segment tree by generalizing the segment tree, a data structure that was originally invented for computational geometry. For a range query of length $L$, our TUCKET can find $O(\log L)$ nodes (called the hit set) from the tree and efficiently stitch their preprocessed decompositions to answer the range query. We also propose an algorithm to optimally prune the hit set via an approximation of subtensor decomposition. For the $T$-th stream update, our TUCKET modifies only amortized $O(1)$ nodes and only $O(\log T)$ nodes in the worst case. Extensive evaluation demonstrates that our TUCKET consistently achieves the highest efficiency and accuracy across four large-scale datasets. Our TUCKET achieves at least 3 times lower latency and at least 1.4 times smaller reconstruction error than Zoom-Tucker on all datasets.