Systematic Assessment of Tabular Data Synthesis

February 09, 2024 · Declared Dead · 🏛 Conference on Computer and Communications Security

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Yuntao Du, Ninghui Li arXiv ID 2402.06806 Category cs.CR: Cryptography & Security Cross-listed cs.DB, cs.LG Citations 12 Venue Conference on Computer and Communications Security Last Checked 3 months ago

Abstract

Data synthesis has been advocated as an important approach for utilizing data while protecting data privacy. In recent years, a plethora of tabular data synthesis algorithms (i.e., synthesizers) have been proposed. Some synthesizers satisfy Differential Privacy, while others aim to provide privacy in a heuristic fashion. A comprehensive understanding of the strengths and weaknesses of these synthesizers remains elusive due to drawbacks in evaluation metrics and missing head-to-head comparisons of newly developed synthesizers that take advantage of diffusion models and large language models with state-of-the-art statistical synthesizers. In this paper, we present a systematic evaluation framework for assessing tabular data synthesis algorithms. Specifically, we examine and critique existing evaluation metrics, and introduce a set of new metrics in terms of fidelity, privacy, and utility to address their limitations. We conducted extensive evaluations of 8 different types of synthesizers on 12 real-world datasets and identified some interesting findings, which offer new directions for privacy-preserving data synthesis.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Cryptography & Security

R.I.P. 👻 Ghosted

Towards Evaluating the Robustness of Neural Networks

Nicholas Carlini, David Wagner

cs.CR 🏛 IEEE S&P 📚 9.5K cites 9 years ago

R.I.P. 👻 Ghosted

Membership Inference Attacks against Machine Learning Models

Reza Shokri, Marco Stronati, ... (+2 more)

cs.CR 🏛 IEEE S&P 📚 4.9K cites 9 years ago

R.I.P. 👻 Ghosted

The Limitations of Deep Learning in Adversarial Settings

Nicolas Papernot, Patrick McDaniel, ... (+4 more)

cs.CR 🏛 IEEE S&P 📚 4.2K cites 10 years ago

R.I.P. 👻 Ghosted

Practical Black-Box Attacks against Machine Learning

Nicolas Papernot, Patrick McDaniel, ... (+4 more)

cs.CR 🏛 ASIACCS 📚 3.9K cites 10 years ago

R.I.P. 👻 Ghosted

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

Nicolas Papernot, Patrick McDaniel, ... (+3 more)

cs.CR 🏛 IEEE S&P 📚 3.2K cites 10 years ago

R.I.P. 👻 Ghosted

Extracting Training Data from Large Language Models

Nicholas Carlini, Florian Tramer, ... (+10 more)

cs.CR 🏛 USENIX Sec 📚 2.6K cites 5 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 6 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago