Statistical consistency and asymptotic normality for high-dimensional robust M-estimators
January 01, 2015 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Po-Ling Loh
arXiv ID
1501.00312
Category
math.ST
Cross-listed
cs.IT,
stat.ML
Citations
216
Venue
arXiv.org
Last Checked
1 month ago
Abstract
We study theoretical properties of regularized robust M-estimators, applicable when data are drawn from a sparse high-dimensional linear model and contaminated by heavy-tailed distributions and/or outliers in the additive errors and covariates. We first establish a form of local statistical consistency for the penalized regression estimators under fairly mild conditions on the error distribution: When the derivative of the loss function is bounded and satisfies a local restricted curvature condition, all stationary points within a constant radius of the true regression vector converge at the minimax rate enjoyed by the Lasso with sub-Gaussian errors. When an appropriate nonconvex regularizer is used in place of an l_1-penalty, we show that such stationary points are in fact unique and equal to the local oracle solution with the correct support---hence, results on asymptotic normality in the low-dimensional case carry over immediately to the high-dimensional setting. This has important implications for the efficiency of regularized nonconvex M-estimators when the errors are heavy-tailed. Our analysis of the local curvature of the loss function also has useful consequences for optimization when the robust regression function and/or regularizer is nonconvex and the objective function possesses stationary points outside the local region. We show that as long as a composite gradient descent algorithm is initialized within a constant radius of the true regression vector, successive iterates will converge at a linear rate to a stationary point within the local region. Furthermore, the global optimum of a convex regularized robust regression function may be used to obtain a suitable initialization. The result is a novel two-step procedure that uses a convex M-estimator to achieve consistency and a nonconvex M-estimator to increase efficiency.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β math.ST
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists
R.I.P.
π»
Ghosted
Minimax Optimal Procedures for Locally Private Estimation
R.I.P.
π»
Ghosted
Optimal Best Arm Identification with Fixed Confidence
R.I.P.
π»
Ghosted
Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees
R.I.P.
π»
Ghosted
User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted