VOLTAGE: A Versatile Contrastive Learning based OCR Methodology for ultra low-resource scripts through Auto Glyph Feature Extraction

October 12, 2025 ยท Declared Dead ยท ๐Ÿ› Conference of the European Chapter of the Association for Computational Linguistics

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Prawaal Sharma, Poonam Goyal, Vidisha Sharma, Navneet Goyal arXiv ID 2510.10490 Category cs.CL: Computation & Language Citations 1 Venue Conference of the European Chapter of the Association for Computational Linguistics Last Checked 3 months ago
Abstract
UNESCO has classified 2500 out of 7000 languages spoken worldwide as endangered. Attrition of a language leads to loss of traditional wisdom, folk literature, and the essence of the community that uses it. It is therefore imperative to bring digital inclusion to these languages and avoid its extinction. Low resource languages are at a greater risk of extinction. Lack of unsupervised Optical Character Recognition(OCR) methodologies for low resource languages is one of the reasons impeding their digital inclusion. We propose VOLTAGE - a contrastive learning based OCR methodology, leveraging auto-glyph feature recommendation for cluster-based labelling. We augment the labelled data for diversity and volume using image transformations and Generative Adversarial Networks. Voltage has been designed using Takri - a family of scripts used in 16th to 20th century in the Himalayan regions of India. We present results for Takri along with other Indic scripts (both low and high resource) to substantiate the universal behavior of the methodology. An accuracy of 95% for machine printed and 87% for handwritten samples on Takri script has been achieved. We conduct baseline and ablation studies along with building downstream use cases for Takri, demonstrating the usefulness of our work.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago

Died the same way โ€” ๐Ÿ‘ป Ghosted