JFinTEB: Japanese Financial Text Embedding Benchmark

April 17, 2026 ยท Grace Period ยท ๐Ÿ› SIGIR 2026 Resource Track

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Masahiro Suzuki, Hiroki Sakaji arXiv ID 2604.15882 Category cs.IR: Information Retrieval Cross-listed cs.CL Citations 0 Venue SIGIR 2026 Resource Track
Abstract
We introduce JFinTEB, the first comprehensive benchmark specifically designed for evaluating Japanese financial text embeddings. Existing embedding benchmarks provide limited coverage of language-specific and domain-specific aspects found in Japanese financial texts. Our benchmark encompasses diverse task categories including retrieval and classification tasks that reflect realistic and well-defined financial text processing scenarios. The retrieval tasks leverage instruction-following datasets and financial text generation queries, while classification tasks cover sentiment analysis, document categorization, and domain-specific classification challenges derived from economic survey data. We conduct extensive evaluations across a wide range of embedding models, including Japanese-specific models of various sizes, multilingual models, and commercial embedding services. We publicly release JFinTEB datasets and evaluation framework at https://github.com/retarfi/JFinTEB to facilitate future research and provide a standardized evaluation protocol for the Japanese financial text mining community. This work addresses a critical gap in Japanese financial text processing resources and establishes a foundation for advancing domain-specific embedding research.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Information Retrieval