Afro-MNIST: Synthetic generation of MNIST-style datasets for low-resource languages

September 28, 2020 · Entered Twilight · 🏛 arXiv.org

"Last commit was 5.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitattributes, Ethiopic, LICENSE, NKo, Osmanya, README.md, Vai

Authors Daniel J Wu, Andrew C Yang, Vinay U Prabhu arXiv ID 2009.13509 Category cs.CV: Computer Vision Cross-listed cs.LG Citations 6 Venue arXiv.org Repository https://github.com/Daniel-Wu/AfroMNIST ⭐ 16 Last Checked 2 months ago

Abstract

We present Afro-MNIST, a set of synthetic MNIST-style datasets for four orthographies used in Afro-Asiatic and Niger-Congo languages: Ge`ez (Ethiopic), Vai, Osmanya, and N'Ko. These datasets serve as "drop-in" replacements for MNIST. We also describe and open-source a method for synthetic MNIST-style dataset generation from single examples of each digit. These datasets can be found at https://github.com/Daniel-Wu/AfroMNIST. We hope that MNIST-style datasets will be developed for other numeral systems, and that these datasets vitalize machine learning education in underrepresented nations in the research community.