Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
June 12, 2017 Β· Declared Dead Β· π IEEE International Parallel and Distributed Processing Symposium
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Dingwen Tao, Sheng Di, Zizhong Chen, Franck Cappello
arXiv ID
1706.03791
Category
cs.IT: Information Theory
Citations
294
Venue
IEEE International Parallel and Distributed Processing Symposium
Last Checked
3 months ago
Abstract
Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions. We derive a series of multilayer prediction formulas and their unified formula in the context of data compression. One serious challenge is that the data prediction has to be performed based on the preceding decompressed values during the compression in order to guarantee the error bounds, which may degrade the prediction accuracy in turn. We explore the best layer for the prediction by considering the impact of compression errors on the prediction accuracy. Moreover, we propose an adaptive error-controlled quantization encoder, which can further improve the prediction hitting rate considerably. The data size can be reduced significantly after performing the variable-length encoding because of the uneven distribution produced by our quantization encoder. We evaluate the new compressor on production scientific data sets and compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP, SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class, especially with regard to compression factors (or bit-rates) and compression errors (including RMSE, NRMSE, and PSNR). Our solution is better than the second-best solution by more than a 2x increase in the compression factor and 3.8x reduction in the normalized root mean squared error on average, with reasonable error bounds and user-desired bit-rates.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Information Theory
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
A Vision of 6G Wireless Systems: Applications, Trends, Technologies, and Open Research Problems
R.I.P.
π»
Ghosted
Towards Smart and Reconfigurable Environment: Intelligent Reflecting Surface Aided Wireless Network
π
π
The Cartographer
Wireless Communications with Unmanned Aerial Vehicles: Opportunities and Challenges
R.I.P.
π»
Ghosted
Reconfigurable Intelligent Surfaces for Energy Efficiency in Wireless Communication
π
π
The Cartographer
An Overview of Signal Processing Techniques for Millimeter Wave MIMO Systems
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted