Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels

December 02, 2019 · Entered Twilight · 🏛 arXiv.org

"Last commit was 5.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: CONTRIBUTING.md, LICENSE, README.md, libml, mixmatch.py, mixmatch_lineargrow.py, requirements.txt, scripts

Authors Shuang Song, David Berthelot, Afshin Rostamizadeh arXiv ID 1912.00594 Category cs.LG: Machine Learning Cross-listed stat.ML Citations 35 Venue arXiv.org Repository https://github.com/google-research/mma ⭐ 14 Last Checked 1 month ago

Abstract

We propose using active learning based techniques to further improve the state-of-the-art semi-supervised learning MixMatch algorithm. We provide a thorough empirical evaluation of several active-learning and baseline methods, which successfully demonstrate a significant improvement on the benchmark CIFAR-10, CIFAR-100, and SVHN datasets (as much as 1.5% in absolute accuracy). We also provide an empirical analysis of the cost trade-off between incrementally gathering more labeled versus unlabeled data. This analysis can be used to measure the relative value of labeled/unlabeled data at different points of the learning curve, where we find that although the incremental value of labeled data can be as much as 20x that of unlabeled, it quickly diminishes to less than 3x once more than 2,000 labeled example are observed. Code can be found at https://github.com/google-research/mma.