C$^2$LEVA: Toward Comprehensive and Contamination-Free Language Model Evaluation

December 06, 2024 ยท Declared Dead ยท ๐Ÿ› arXiv.org

๐Ÿฆด CAUSE OF DEATH: Skeleton Repo
Boilerplate only, no real code

Repo contents: .gitignore, LICENSE, README.md, benchmark.svg, framework.gif

Authors Yanyang Li, Tin Long Wong, Cheung To Hung, Jianqiao Zhao, Duo Zheng, Ka Wai Liu, Michael R. Lyu, Liwei Wang arXiv ID 2412.04947 Category cs.CL: Computation & Language Citations 3 Venue arXiv.org Repository https://github.com/LaVi-Lab/C2LEVA โญ 2 Last Checked 1 month ago
Abstract
Recent advances in large language models (LLMs) have shown significant promise, yet their evaluation raises concerns, particularly regarding data contamination due to the lack of access to proprietary training data. To address this issue, we present C$^2$LEVA, a comprehensive bilingual benchmark featuring systematic contamination prevention. C$^2$LEVA firstly offers a holistic evaluation encompassing 22 tasks, each targeting a specific application or ability of LLMs, and secondly a trustworthy assessment due to our contamination-free tasks, ensured by a systematic contamination prevention strategy that fully automates test data renewal and enforces data protection during benchmark data release. Our large-scale evaluation of 15 open-source and proprietary models demonstrates the effectiveness of C$^2$LEVA.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago

Died the same way โ€” ๐Ÿฆด Skeleton Repo

R.I.P. ๐Ÿฆด Skeleton Repo

Neural Style Transfer: A Review

Yongcheng Jing, Yezhou Yang, ... (+4 more)

cs.CV ๐Ÿ› IEEE TVCG ๐Ÿ“š 828 cites 8 years ago