Towards Interpretable and Learnable Risk Analysis for Entity Resolution

December 06, 2019 · Declared Dead · 🏛 SIGMOD Conference

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Zhaoqiang Chen, Qun Chen, Boyi Hou, Tianyi Duan, Zhanhuai Li, Guoliang Li arXiv ID 1912.02947 Category cs.DB: Databases Cross-listed cs.LG Citations 23 Venue SIGMOD Conference Last Checked 3 months ago

Abstract

Machine-learning-based entity resolution has been widely studied. However, some entity pairs may be mislabeled by machine learning models and existing studies do not study the risk analysis problem -- predicting and interpreting which entity pairs are mislabeled. In this paper, we propose an interpretable and learnable framework for risk analysis, which aims to rank the labeled pairs based on their risks of being mislabeled. We first describe how to automatically generate interpretable risk features, and then present a learnable risk model and its training technique. Finally, we empirically evaluate the performance of the proposed approach on real data. Our extensive experiments have shown that the learning risk model can identify the mislabeled pairs with considerably higher accuracy than the existing alternatives.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Databases

R.I.P. 👻 Ghosted

Datasheets for Datasets

Timnit Gebru, Jamie Morgenstern, ... (+5 more)

cs.DB 🏛 CACM 📚 2.6K cites 8 years ago

R.I.P. 👻 Ghosted

The Case for Learned Index Structures

Tim Kraska, Alex Beutel, ... (+3 more)

cs.DB 🏛 SIGMOD 📚 1.2K cites 8 years ago

R.I.P. 👻 Ghosted

Untangling Blockchain: A Data Processing View of Blockchain Systems

Tien Tuan Anh Dinh, Rui Liu, ... (+4 more)

cs.DB 🏛 IEEE TKDE 📚 997 cites 8 years ago

R.I.P. 👻 Ghosted

Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades

Garrick Orchard, Ajinkya Jayawant, ... (+2 more)

cs.DB 🏛 Frontiers in Neuroscience 📚 905 cites 10 years ago

R.I.P. 👻 Ghosted

BLOCKBENCH: A Framework for Analyzing Private Blockchains

Tien Tuan Anh Dinh, Ji Wang, ... (+4 more)

cs.DB 🏛 SIGMOD 📚 872 cites 9 years ago

R.I.P. 👻 Ghosted

Data Synthesis based on Generative Adversarial Networks

Noseong Park, Mahmoud Mohammadi, ... (+4 more)

cs.DB 🏛 VLDB 📚 568 cites 7 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 6 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago