Clustering Via Crowdsourcing
April 07, 2016 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Arya Mazumdar, Barna Saha
arXiv ID
1604.01839
Category
cs.DS: Data Structures & Algorithms
Cross-listed
cs.IT,
cs.LG
Citations
20
Venue
arXiv.org
Last Checked
3 months ago
Abstract
In recent years, crowdsourcing, aka human aided computation has emerged as an effective platform for solving problems that are considered complex for machines alone. Using human is time-consuming and costly due to monetary compensations. Therefore, a crowd based algorithm must judiciously use any information computed through an automated process, and ask minimum number of questions to the crowd adaptively. One such problem which has received significant attention is {\em entity resolution}. Formally, we are given a graph $G=(V,E)$ with unknown edge set $E$ where $G$ is a union of $k$ (again unknown, but typically large $O(n^Ξ±)$, for $Ξ±>0$) disjoint cliques $G_i(V_i, E_i)$, $i =1, \dots, k$. The goal is to retrieve the sets $V_i$s by making minimum number of pair-wise queries $V \times V\to\{\pm1\}$ to an oracle (the crowd). When the answer to each query is correct, e.g. via resampling, then this reduces to finding connected components in a graph. On the other hand, when crowd answers may be incorrect, it corresponds to clustering over minimum number of noisy inputs. Even, with perfect answers, a simple lower and upper bound of $Ξ(nk)$ on query complexity can be shown. A major contribution of this paper is to reduce the query complexity to linear or even sublinear in $n$ when mild side information is provided by a machine, and even in presence of crowd errors which are not correctable via resampling. We develop new information theoretic lower bounds on the query complexity of clustering with side information and errors, and our upper bounds closely match with them. Our algorithms are naturally parallelizable, and also give near-optimal bounds on the number of adaptive rounds required to match the query complexity.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted