Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts

September 01, 2018 · Entered Twilight · 🏛 Conference on Empirical Methods in Natural Language Processing

"Last commit was 7.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, CODE_README.md, LICENSE, README.md, deliberativeness, lime, nn, rationale, utils

Authors Samuel Carton, Qiaozhu Mei, Paul Resnick arXiv ID 1809.01499 Category cs.CL: Computation & Language Cross-listed cs.IR, cs.LG, stat.ML Citations 34 Venue Conference on Empirical Methods in Natural Language Processing Repository https://github.com/shcarton/rcnn ⭐ 1 Last Checked 1 month ago

Abstract

We introduce an adversarial method for producing high-recall explanations of neural text classifier decisions. Building on an existing architecture for extractive explanations via hard attention, we add an adversarial layer which scans the residual of the attention for remaining predictive signal. Motivated by the important domain of detecting personal attacks in social media comments, we additionally demonstrate the importance of manually setting a semantically appropriate `default' behavior for the model by explicitly manipulating its bias term. We develop a validation set of human-annotated personal attacks to evaluate the impact of these changes.