Investigating the Scaling Effect of Instruction Templates for Training Multimodal Language Model

December 11, 2024 · Declared Dead · + Add venue

⚰️ CAUSE OF DEATH: The Empty Tomb
GitHub repo is empty
Authors Shijian Wang, Linxin Song, Jieyu Zhang, Ryotaro Shimizu, Jiarui Jin, Ao Luo, Yuan Lu, Li Yao, Cunjian Chen, Julian McAuley, Wentao Zhang, Hanqian Wu arXiv ID 2412.08307 Category cs.CV: Computer Vision Citations 1 Repository https://github.com/shijian2001/TemplateScaling Last Checked 1 month ago
Abstract
Current multimodal language model (MLM) training approaches overlook the influence of instruction templates. Previous research deals with this problem by leveraging hand-crafted or model-generated templates, failing to investigate the scaling effect of instruction templates on MLM training. In this work, we propose a programmatic instruction template generator capable of producing over 15K unique instruction templates by filling randomly sampled positional synonyms into weighted sampled meta templates, enabling us to comprehensively explore MLM's performance across various template scales in the training process. Our investigation into scaling instruction templates for MLM training demonstrates that MLM capabilities do not consistently improve with increasing template scale. Instead, optimal performance is achieved at a medium template scale. Models trained with data augmented at the optimal template scale achieve performance gains of up to 10% over those trained on the original data and achieve the best overall performance compared with the similar-scale MLMs tuned on at most 75 times the scale of our augmented dataset. The code will be publicly available at https://github.com/shijian2001/TemplateScaling.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Computer Vision

Died the same way — ⚰️ The Empty Tomb