Multilingual Reference Need Assessment System for Wikipedia

March 17, 2026 ยท Grace Period ยท ๐Ÿ› Proceedings of the ACM Web Conference 2026 (WWW '26), April 13--17, 2026, Dubai, United Arab Emirates

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Aitolkyn Baigutanova, Francisco Navas, Pablo Aragon, Mykola Trokhymovych, Muniza Aslam, Ai-Jou Chou, Miriam Redi, Diego Saez-Trumper arXiv ID 2603.17146 Category cs.CY: Computers & Society Cross-listed cs.CL Citations 0 Venue Proceedings of the ACM Web Conference 2026 (WWW '26), April 13--17, 2026, Dubai, United Arab Emirates
Abstract
Wikipedia is a critical source of information for millions of users across the Web. It serves as a key resource for large language models, search engines, question-answering systems, and other Web-based applications. In Wikipedia, content needs to be verifiable, meaning that readers can check that claims are backed by references to reliable sources. This depends on manual verification by editors, an effective but labor-intensive process, especially given the high volume of daily edits. To address this challenge, we introduce a multilingual machine learning system to assist editors in identifying claims requiring citations. Our approach is tested in 10 language editions of Wikipedia, outperforming existing benchmarks for reference need assessment. We not only consider machine learning evaluation metrics but also system requirements, allowing us to explore the trade-offs between model accuracy and computational efficiency under real-world infrastructure constraints. We deploy our system in production and release data and code to support further research.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computers & Society

R.I.P. ๐Ÿ‘ป Ghosted

Green AI

Roy Schwartz, Jesse Dodge, ... (+2 more)

cs.CY ๐Ÿ› arXiv ๐Ÿ“š 1.5K cites 6 years ago