Statistical Quality and Reproducibility of Pseudorandom Number Generators in Machine Learning technologies
July 02, 2025 ยท Declared Dead ยท ๐ International Journal of Data Informatics and Intelligent Computing
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Benjamin A. Antunes
arXiv ID
2507.03007
Category
cs.OH: Other CS
Cross-listed
cs.CR,
cs.LG
Citations
0
Venue
International Journal of Data Informatics and Intelligent Computing
Last Checked
1 month ago
Abstract
Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators-particularly when integrated into frameworks like PyTorch, TensorFlow, and NumPy-are underexplored. In this paper, we compare the statistical quality of PRNGs used in ML frameworks (Mersenne Twister, PCG, and Philox) against their original C implementations. Using the rigorous TestU01 BigCrush test suite, we evaluate 896 independent random streams for each generator. Our findings challenge claims of statistical robustness, revealing that even generators labeled ''crush-resistant'' (e.g., PCG, Philox) may fail certain statistical tests. Surprisingly, we can observe some differences in failure profiles between the native and framework-integrated versions of the same algorithm, highlighting some implementation differences that may exist.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Other CS
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car
R.I.P.
๐ป
Ghosted
Governance by Glass-Box: Implementing Transparent Moral Bounds for AI Behaviour
R.I.P.
๐ป
Ghosted
Pragmatic inference and visual abstraction enable contextual flexibility during visual communication
R.I.P.
๐ป
Ghosted
Design and Implementation of a Novel Compatible Encoding Scheme in the Time Domain for Image Sensor Communication
R.I.P.
๐ป
Ghosted
Detecting Plagiarism based on the Creation Process
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted