Minimal Algorithmic Information Loss Methods for Dimension Reduction, Feature Selection and Network Sparsification

February 16, 2018 · Entered Twilight · 🏛 Information Sciences

"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, README.md, data, scripts, server.R, ui.R

Authors Hector Zenil, Narsis A. Kiani, Alyssa Adams, Felipe S. Abrahão, Antonio Rueda-Toicen, Allan A. Zea, Luan Ozelim, Jesper Tegnér arXiv ID 1802.05843 Category cs.DS: Data Structures & Algorithms Cross-listed cs.IT, physics.soc-ph Citations 16 Venue Information Sciences Repository https://github.com/andandandand/MILS ⭐ 1 Last Checked 28 days ago

Abstract

We present a novel, domain-agnostic, model-independent, unsupervised, and universally applicable Machine Learning approach for dimensionality reduction based on the principles of algorithmic complexity. Specifically, but without loss of generality, we focus on addressing the challenge of reducing certain dimensionality aspects, such as the number of edges in a network, while retaining essential features of interest. These features include preserving crucial network properties like degree distribution, clustering coefficient, edge betweenness, and degree and eigenvector centralities but can also go beyond edges to nodes and weights for network pruning and trimming. Our approach outperforms classical statistical Machine Learning techniques and state-of-the-art dimensionality reduction algorithms by preserving a greater number of data features that statistical algorithms would miss, particularly nonlinear patterns stemming from deterministic recursive processes that may look statistically random but are not. Moreover, previous approaches heavily rely on a priori feature selection, which requires constant supervision. Our findings demonstrate the effectiveness of the algorithms in overcoming some of these limitations while maintaining a time-efficient computational profile. Our approach not only matches, but also exceeds, the performance of established and state-of-the-art dimensionality reduction algorithms. We extend the applicability of our method to lossy compression tasks involving images and any multi-dimensional data. This highlights the versatility and broad utility of the approach in multiple domains.