Faster Population Counts Using AVX2 Instructions

November 23, 2016 · Entered Twilight · 🏛 Computer/law journal

"Last commit was 7.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, LICENSE, Makefile, README.md, benchmarks, compare_compiler.sh, format.py, include, src, test_build.sh, tests

Authors Wojciech Muła, Nathan Kurz, Daniel Lemire arXiv ID 1611.07612 Category cs.DS: Data Structures & Algorithms Citations 54 Venue Computer/law journal Repository https://github.com/CountOnes/hamming_weight ⭐ 51 Last Checked 1 month ago

Abstract

Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g., popcnt on x64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors. The benefits can be even greater for applications such as similarity measures (e.g., the Jaccard index) that require additional Boolean operations. Our approach has been adopted by LLVM: it is used by its popular C compiler (clang).