Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems

January 22, 2020 Β· Declared Dead Β· πŸ› International Conference on Intelligent User Interfaces

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Zana BuΓ§inca, Phoebe Lin, Krzysztof Z. Gajos, Elena L. Glassman arXiv ID 2001.08298 Category cs.AI: Artificial Intelligence Cross-listed cs.HC Citations 329 Venue International Conference on Intelligent User Interfaces Last Checked 3 months ago
Abstract
Explainable artificially intelligent (XAI) systems form part of sociotechnical systems, e.g., human+AI teams tasked with making decisions. Yet, current XAI systems are rarely evaluated by measuring the performance of human+AI teams on actual decision-making tasks. We conducted two online experiments and one in-person think-aloud study to evaluate two currently common techniques for evaluating XAI systems: (1) using proxy, artificial tasks such as how well humans predict the AI's decision from the given explanations, and (2) using subjective measures of trust and preference as predictors of actual performance. The results of our experiments demonstrate that evaluations with proxy tasks did not predict the results of the evaluations with the actual decision-making tasks. Further, the subjective measures on evaluations with actual decision-making tasks did not predict the objective performance on those same tasks. Our results suggest that by employing misleading evaluation methods, our field may be inadvertently slowing its progress toward developing human+AI teams that can reliably perform better than humans or AIs alone.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Artificial Intelligence

Died the same way β€” πŸ‘» Ghosted