1Cademy @ Causal News Corpus 2022: Leveraging Self-Training in Causality Classification of Socio-Political Event Data

November 04, 2022 · Entered Twilight · 🏛 CASE

Repo contents: .gitignore, LICENSE, README.md, Subtask 1, data_augmentation, mingyuli, model_testing, multitask_learning, xingran

Authors Adam Nik, Ge Zhang, Xingran Chen, Mingyu Li, Jie Fu arXiv ID 2211.02729 Category cs.CL: Computation & Language Citations 4 Venue CASE Repository https://github.com/Gzhang-umich/1CademyTeamOfCASE ⭐ 2 Last Checked 1 month ago

Abstract

This paper details our participation in the Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) workshop @ EMNLP 2022, where we take part in Subtask 1 of Shared Task 3. We approach the given task of event causality detection by proposing a self-training pipeline that follows a teacher-student classifier method. More specifically, we initially train a teacher model on the true, original task data, and use that teacher model to self-label data to be used in the training of a separate student model for the final task prediction. We test how restricting the number of positive or negative self-labeled examples in the self-training process affects classification performance. Our final results show that using self-training produces a comprehensive performance improvement across all models and self-labeled training sets tested within the task of event causality sequence classification. On top of that, we find that self-training performance did not diminish even when restricting either positive/negative examples used in training. Our code is be publicly available at https://github.com/Gzhang-umich/1CademyTeamOfCASE.