Statistical Quality and Reproducibility of Pseudorandom Number Generators in Machine Learning technologies

July 02, 2025 · Declared Dead · 🏛 International Journal of Data Informatics and Intelligent Computing

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Benjamin A. Antunes arXiv ID 2507.03007 Category cs.OH: Other CS Cross-listed cs.CR, cs.LG Citations 0 Venue International Journal of Data Informatics and Intelligent Computing Last Checked 1 month ago

Abstract

Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators-particularly when integrated into frameworks like PyTorch, TensorFlow, and NumPy-are underexplored. In this paper, we compare the statistical quality of PRNGs used in ML frameworks (Mersenne Twister, PCG, and Philox) against their original C implementations. Using the rigorous TestU01 BigCrush test suite, we evaluate 896 independent random streams for each generator. Our findings challenge claims of statistical robustness, revealing that even generators labeled ''crush-resistant'' (e.g., PCG, Philox) may fail certain statistical tests. Surprisingly, we can observe some differences in failure profiles between the native and framework-integrated versions of the same algorithm, highlighting some implementation differences that may exist.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Other CS

R.I.P. 👻 Ghosted

6-Layer Model for a Structured Description and Categorization of Urban Traffic and Environment

Maike Scholtes, Lukas Westhofen, ... (+12 more)

cs.OH 🏛 IEEE Access 📚 160 cites 5 years ago

R.I.P. 👻 Ghosted

DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car

Michael G. Bechtel, Elise McEllhiney, ... (+2 more)

cs.OH 🏛 ICERCSA 📚 114 cites 8 years ago

R.I.P. 👻 Ghosted

Governance by Glass-Box: Implementing Transparent Moral Bounds for AI Behaviour

Andrea Aler Tubella, Andreas Theodorou, ... (+2 more)

cs.OH 🏛 IJCAI 📚 43 cites 6 years ago

R.I.P. 👻 Ghosted

Pragmatic inference and visual abstraction enable contextual flexibility during visual communication

Judith Fan, Robert Hawkins, ... (+2 more)

cs.OH 🏛 Computational Brain & Behavior 📚 43 cites 7 years ago

R.I.P. 👻 Ghosted

Design and Implementation of a Novel Compatible Encoding Scheme in the Time Domain for Image Sensor Communication

Trang Nguyen, Mohammad Arif Hossain, Yeong Min Jang

cs.OH 🏛 Sensors 📚 35 cites 9 years ago

R.I.P. 👻 Ghosted

Detecting Plagiarism based on the Creation Process

Johannes Schneider, Avi Bernstein, ... (+3 more)

cs.OH 🏛 IEEE TLT 📚 29 cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago