The Relational Data Borg is Learning
August 18, 2020 ยท Declared Dead ยท ๐ Proceedings of the VLDB Endowment
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Dan Olteanu
arXiv ID
2008.07864
Category
cs.DB: Databases
Cross-listed
cs.LG
Citations
27
Venue
Proceedings of the VLDB Endowment
Last Checked
3 months ago
Abstract
This paper overviews an approach that addresses machine learning over relational data as a database problem. This is justified by two observations. First, the input to the learning task is commonly the result of a feature extraction query over the relational data. Second, the learning task requires the computation of group-by aggregates. This approach has been already investigated for a number of supervised and unsupervised learning tasks, including: ridge linear regression, factorisation machines, support vector machines, decision trees, principal component analysis, and k-means; and also for linear algebra over data matrices. The main message of this work is that the runtime performance of machine learning can be dramatically boosted by a toolbox of techniques that exploit the knowledge of the underlying data. This includes theoretical development on the algebraic, combinatorial, and statistical structure of relational data processing and systems development on code specialisation, low-level computation sharing, and parallelisation. These techniques aim at lowering both the complexity and the constant factors of the learning time. This work is the outcome of extensive collaboration of the author with colleagues from RelationalAI, in particular Mahmoud Abo Khamis, Molham Aref, Hung Ngo, and XuanLong Nguyen, and from the FDB research project, in particular Ahmet Kara, Milos Nikolic, Maximilian Schleich, Amir Shaikhha, Jakub Zavodny, and Haozhe Zhang. The author would also like to thank the members of the FDB project for the figures and examples used in this paper. The author is grateful for support from industry: Amazon Web Services, Google, Infor, LogicBlox, Microsoft Azure, RelationalAI; and from the funding agencies EPSRC and ERC. This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 682588.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Databases
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
The Case for Learned Index Structures
R.I.P.
๐ป
Ghosted
Untangling Blockchain: A Data Processing View of Blockchain Systems
R.I.P.
๐ป
Ghosted
Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades
R.I.P.
๐ป
Ghosted
BLOCKBENCH: A Framework for Analyzing Private Blockchains
R.I.P.
๐ป
Ghosted
Data Synthesis based on Generative Adversarial Networks
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted