Learning Intersections of Halfspaces with Distribution Shift: Improved Algorithms and SQ Lower Bounds
April 02, 2024 Β· Declared Dead Β· π Annual Conference Computational Learning Theory
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan
arXiv ID
2404.02364
Category
cs.DS: Data Structures & Algorithms
Cross-listed
cs.LG
Citations
10
Venue
Annual Conference Computational Learning Theory
Last Checked
4 months ago
Abstract
Recent work of Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift (TDS learning), where a learner is given labeled samples from training distribution $\mathcal{D}$, unlabeled samples from test distribution $\mathcal{D}'$, and the goal is to output a classifier with low error on $\mathcal{D}'$ whenever the training samples pass a corresponding test. Their model deviates from all prior work in that no assumptions are made on $\mathcal{D}'$. Instead, the test must accept (with high probability) when the marginals of the training and test distributions are equal. Here we focus on the fundamental case of intersections of halfspaces with respect to Gaussian training distributions and prove a variety of new upper bounds including a $2^{(k/Ξ΅)^{O(1)}} \mathsf{poly}(d)$-time algorithm for TDS learning intersections of $k$ homogeneous halfspaces to accuracy $Ξ΅$ (prior work achieved $d^{(k/Ξ΅)^{O(1)}}$). We work under the mild assumption that the Gaussian training distribution contains at least an $Ξ΅$ fraction of both positive and negative examples ($Ξ΅$-balanced). We also prove the first set of SQ lower-bounds for any TDS learning problem and show (1) the $Ξ΅$-balanced assumption is necessary for $\mathsf{poly}(d,1/Ξ΅)$-time TDS learning for a single halfspace and (2) a $d^{\tildeΞ©(\log 1/Ξ΅)}$ lower bound for the intersection of two general halfspaces, even with the $Ξ΅$-balanced assumption. Our techniques significantly expand the toolkit for TDS learning. We use dimension reduction and coverings to give efficient algorithms for computing a localized version of discrepancy distance, a key metric from the domain adaptation literature.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted