Crowdsourcing subjective annotations using pairwise comparisons reduces bias and error compared to the majority-vote method
May 31, 2023 ยท Declared Dead ยท ๐ Proc. ACM Hum. Comput. Interact.
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Hasti Narimanzadeh, Arash Badie-Modiri, Iuliia Smirnova, Ted Hsuan Yun Chen
arXiv ID
2305.20042
Category
cs.HC: Human-Computer Interaction
Cross-listed
cs.CY
Citations
19
Venue
Proc. ACM Hum. Comput. Interact.
Last Checked
3 months ago
Abstract
How to better reduce measurement variability and bias introduced by subjectivity in crowdsourced labelling remains an open question. We introduce a theoretical framework for understanding how random error and measurement bias enter into crowdsourced annotations of subjective constructs. We then propose a pipeline that combines pairwise comparison labelling with Elo scoring, and demonstrate that it outperforms the ubiquitous majority-voting method in reducing both types of measurement error. To assess the performance of the labelling approaches, we constructed an agent-based model of crowdsourced labelling that lets us introduce different types of subjectivity into the tasks. We find that under most conditions with task subjectivity, the comparison approach produced higher $f_1$ scores. Further, the comparison approach is less susceptible to inflating bias, which majority voting tends to do. To facilitate applications, we show with simulated and real-world data that the number of required random comparisons for the same classification accuracy scales log-linearly $O(N \log N)$ with the number of labelled items. We also implemented the Elo system as an open-source Python package.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Human-Computer Interaction
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Improving fairness in machine learning systems: What do industry practitioners need?
R.I.P.
๐ป
Ghosted
Identifying Stable Patterns over Time for Emotion Recognition from EEG
R.I.P.
๐ป
Ghosted
Questioning the AI: Informing Design Practices for Explainable AI User Experiences
R.I.P.
๐ป
Ghosted
Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities
R.I.P.
๐ป
Ghosted
Educational data mining and learning analytics: An updated survey
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted