Towards Gradient Free and Projection Free Stochastic Optimization

October 08, 2018 · Declared Dead · 🏛 International Conference on Artificial Intelligence and Statistics

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Anit Kumar Sahu, Manzil Zaheer, Soummya Kar arXiv ID 1810.03233 Category math.OC: Optimization & Control Cross-listed cs.LG Citations 45 Venue International Conference on Artificial Intelligence and Statistics Last Checked 3 months ago

Abstract

This paper focuses on the problem of \emph{constrained} \emph{stochastic} optimization. A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank-Wolfe algorithm makes it gradient free. Under convexity and smoothness assumption, we show that the proposed algorithm converges to the optimal objective function at a rate $O\left(1/T^{1/3}\right)$, where $T$ denotes the iteration count. In particular, the primal sub-optimality gap is shown to have a dimension dependence of $O\left(d^{1/3}\right)$, which is the best known dimension dependence among all zeroth order optimization algorithms with one directional derivative per iteration. For non-convex functions, we obtain the \emph{Frank-Wolfe} gap to be $O\left(d^{1/3}T^{-1/4}\right)$. Experiments on black-box optimization setups demonstrate the efficacy of the proposed algorithm.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Optimization & Control

R.I.P. 👻 Ghosted

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

Xiangru Lian, Ce Zhang, ... (+4 more)

math.OC 🏛 NeurIPS 📚 1.4K cites 9 years ago

R.I.P. 👻 Ghosted

Local SGD Converges Fast and Communicates Little

Sebastian U. Stich

math.OC 🏛 ICLR 📚 1.2K cites 8 years ago

R.I.P. 👻 Ghosted

On Lazy Training in Differentiable Programming

Lenaic Chizat, Edouard Oyallon, Francis Bach

math.OC 🏛 NeurIPS 📚 930 cites 7 years ago

R.I.P. 👻 Ghosted

A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and Applications

Ankur Sinha, Pekka Malo, Kalyanmoy Deb

math.OC 🏛 IEEE TEC 📚 840 cites 9 years ago

R.I.P. 👻 Ghosted

Learned Primal-dual Reconstruction

Jonas Adler, Ozan Öktem

math.OC 🏛 IEEE TMI 📚 834 cites 8 years ago

R.I.P. 👻 Ghosted

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Lenaic Chizat, Francis Bach

math.OC 🏛 NeurIPS 📚 805 cites 8 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 6 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago