AdapterDrop: On the Efficiency of Adapters in Transformers

October 22, 2020 · Declared Dead · 🏛 Conference on Empirical Methods in Natural Language Processing

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Andreas Rücklé, Gregor Geigle, Max Glockner, Tilman Beck, Jonas Pfeiffer, Nils Reimers, Iryna Gurevych arXiv ID 2010.11918 Category cs.LG: Machine Learning Cross-listed cs.CL Citations 301 Venue Conference on Empirical Methods in Natural Language Processing Last Checked 3 months ago

Abstract

Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters. In this paper, we propose AdapterDrop, removing adapters from lower transformer layers during training and inference, which incorporates concepts from all three directions. We show that AdapterDrop can dynamically reduce the computational overhead when performing inference over multiple tasks simultaneously, with minimal decrease in task performances. We further prune adapters from AdapterFusion, which improves the inference efficiency while maintaining the task performances entirely.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Machine Learning

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago

R.I.P. 👻 Ghosted

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N. Kipf, Max Welling

cs.LG 🏛 ICLR 📚 33.5K cites 9 years ago

R.I.P. 👻 Ghosted

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, ... (+3 more)

cs.LG 🏛 arXiv 📚 25.1K cites 8 years ago

R.I.P. 👻 Ghosted

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, ... (+7 more)

cs.LG 🏛 JMLR 📚 24.4K cites 6 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 6 years ago

R.I.P. 👻 Ghosted

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, Santosh Divvala, ... (+2 more)

cs.CV 🏛 CVPR 📚 43.4K cites 10 years ago

R.I.P. 👻 Ghosted

A Unified Approach to Interpreting Model Predictions

Scott Lundberg, Su-In Lee

cs.AI 🏛 NeurIPS 📚 30.8K cites 9 years ago

R.I.P. 👻 Ghosted

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, Vincent Vanhoucke, ... (+3 more)

cs.CV 🏛 CVPR 📚 30.2K cites 10 years ago