AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing

October 24, 2023 Β· Entered Twilight Β· πŸ› arXiv.org

πŸ’€ TWILIGHT: Eternal Rest
Repo abandoned since publication

Repo contents: Codes + Descriptions, README.md, Real-data.zip, correlation_plots.zip

Authors Namjoon Suh, Xiaofeng Lin, Din-Yin Hsieh, Merhdad Honarkhah, Guang Cheng arXiv ID 2310.15479 Category stat.ML: Machine Learning (Stat) Cross-listed cs.AI, cs.LG Citations 28 Venue arXiv.org Repository https://github.com/UCLA-Trustworthy-AI-Lab/AutoDiffusion ⭐ 15 Last Checked 1 month ago
Abstract
Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model for generating synthetic tabular data. The heterogeneous features in tabular data have been main obstacles in tabular data synthesis, and we tackle this problem by employing the auto-encoder architecture. When compared with the state-of-the-art tabular synthesizers, the resulting synthetic tables from our model show nice statistical fidelities to the real data, and perform well in downstream tasks for machine learning utilities. We conducted the experiments over $15$ publicly available datasets. Notably, our model adeptly captures the correlations among features, which has been a long-standing challenge in tabular data synthesis. Our code is available at https://github.com/UCLA-Trustworthy-AI-Lab/AutoDiffusion.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Machine Learning (Stat)

R.I.P. πŸ‘» Ghosted

Graph Attention Networks

Petar VeličkoviΔ‡, Guillem Cucurull, ... (+4 more)

stat.ML πŸ› ICLR πŸ“š 24.7K cites 8 years ago
R.I.P. πŸ‘» Ghosted

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

stat.ML πŸ› arXiv πŸ“š 12.0K cites 9 years ago