๐
๐
Old Age
Exclusive Supermask Subnetwork Training for Continual Learning
October 18, 2022 ยท Entered Twilight ยท ๐ Annual Meeting of the Association for Computational Linguistics
Repo contents: LICENSE, README.md, args.py, configs, data, main_text.py, main_vision.py, models, requirements.txt, trainers, utils
Authors
Prateek Yadav, Mohit Bansal
arXiv ID
2210.10209
Category
cs.CV: Computer Vision
Cross-listed
cs.AI,
cs.CL,
cs.LG
Citations
9
Venue
Annual Meeting of the Association for Computational Linguistics
Repository
https://github.com/prateeky2806/exessnet
โญ 6
Last Checked
1 month ago
Abstract
Continual Learning (CL) methods focus on accumulating knowledge over time while avoiding catastrophic forgetting. Recently, Wortsman et al. (2020) proposed a CL method, SupSup, which uses a randomly initialized, fixed base network (model) and finds a supermask for each new task that selectively keeps or removes each weight to produce a subnetwork. They prevent forgetting as the network weights are not being updated. Although there is no forgetting, the performance of SupSup is sub-optimal because fixed weights restrict its representational power. Furthermore, there is no accumulation or transfer of knowledge inside the model when new tasks are learned. Hence, we propose ExSSNeT (Exclusive Supermask SubNEtwork Training), that performs exclusive and non-overlapping subnetwork weight training. This avoids conflicting updates to the shared weights by subsequent tasks to improve performance while still preventing forgetting. Furthermore, we propose a novel KNN-based Knowledge Transfer (KKT) module that utilizes previously acquired knowledge to learn new tasks better and faster. We demonstrate that ExSSNeT outperforms strong previous methods on both NLP and Vision domains while preventing forgetting. Moreover, ExSSNeT is particularly advantageous for sparse masks that activate 2-10% of the model parameters, resulting in an average improvement of 8.3% over SupSup. Furthermore, ExSSNeT scales to a large number of tasks (100). Our code is available at https://github.com/prateeky2806/exessnet.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted