S/C: Speeding up Data Materialization with Bounded Memory
March 17, 2023 Β· Declared Dead Β· π IEEE International Conference on Data Engineering
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Zhaoheng Li, Xinyu Pi, Yongjoo Park
arXiv ID
2303.09774
Category
cs.DB: Databases
Citations
5
Venue
IEEE International Conference on Data Engineering
Last Checked
3 months ago
Abstract
With data pipeline tools and the expressiveness of SQL, managing interdependent materialized views (MVs) are becoming increasingly easy. These MVs are updated repeatedly upon new data ingestion (e.g., daily), from which database admins can observe performance metrics (e.g., refresh time of each MV, size on disk) in a consistent way for different types of updates (full vs. incremental) and for different systems (single node, distributed, cloud-hosted). One missed opportunity is that existing data systems treat those MV updates as independent SQL statements without fully exploiting their dependency information and performance metrics. However, if we know that the result of a SQL statement will be consumed immediately after for subsequent operations, those subsequent operations do not have to wait until the early results are fully materialized on storage because the results are already readily available in memory. Of course, this may come at a cost because keeping results in memory (even temporarily) will reduce the amount of available memory; thus, our decision should be careful. In this paper, we introduce a new system, called S/C, which tackles this problem through efficient creation and update of a set of MVs with acyclic dependencies among them. S/C judiciously uses bounded memory to reduce end-to-end MV refresh time by short-circuiting expensive reads and writes; S/C's objective function accurately estimates time savings from keeping intermediate data in memory for particular periods. Our solution jointly optimizes an MV refresh order, what data to keep in memory, and when to release data from memory. At a high level, S/C still materializes all data exactly as defined in MV definitions; thus, it doesn't impact any service-level agreements. In our experiments with TPC-DS datasets (up to 1TB), we show S/C's optimization can speedup end-to-end runtime by 1.04x-5.08x with 1.6GB memory.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Databases
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Untangling Blockchain: A Data Processing View of Blockchain Systems
R.I.P.
π»
Ghosted
Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades
R.I.P.
π»
Ghosted
BLOCKBENCH: A Framework for Analyzing Private Blockchains
R.I.P.
π»
Ghosted
Data Synthesis based on Generative Adversarial Networks
R.I.P.
π»
Ghosted
HoloClean: Holistic Data Repairs with Probabilistic Inference
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted