R.I.P.
๐ป
Ghosted
A Survey on Inference Optimization Techniques for Mixture of Experts Models
December 18, 2024 ยท Declared Dead ยท ๐ ACM Computing Surveys
Repo contents: README.md
Authors
Jiacheng Liu, Peng Tang, Wenfeng Wang, Yuhang Ren, Xiaofeng Hou, Pheng-Ann Heng, Minyi Guo, Chao Li
arXiv ID
2412.14219
Category
cs.LG: Machine Learning
Cross-listed
cs.AI,
cs.DC
Citations
30
Venue
ACM Computing Surveys
Repository
https://github.com/MoE-Inf/awesome-moe-inference/
โญ 342
Last Checked
1 month ago
Abstract
The emergence of large-scale Mixture of Experts (MoE) models represents a significant advancement in artificial intelligence, offering enhanced model capacity and computational efficiency through conditional computation. However, deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency. This comprehensive survey analyzes optimization techniques for MoE models across the entire system stack. We first establish a taxonomical framework that categorizes optimization approaches into model-level, system-level, and hardware-level optimizations. At the model level, we examine architectural innovations including efficient expert design, attention mechanisms, various compression techniques such as pruning, quantization, and knowledge distillation, as well as algorithm improvement including dynamic routing strategies and expert merging methods. At the system level, we investigate distributed computing approaches, load balancing mechanisms, and efficient scheduling algorithms that enable scalable deployment. Furthermore, we delve into hardware-specific optimizations and co-design strategies that maximize throughput and energy efficiency. This survey provides both a structured overview of existing solutions and identifies key challenges and promising research directions in MoE inference optimization. To facilitate ongoing updates and the sharing of cutting-edge advances in MoE inference optimization research, we have established a repository accessible at https://github.com/MoE-Inf/awesome-moe-inference/.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Died the same way โ ๐ Death by README
R.I.P.
๐
Death by README
Momentum Contrast for Unsupervised Visual Representation Learning
R.I.P.
๐
Death by README
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
R.I.P.
๐
Death by README
Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach
R.I.P.
๐
Death by README