Gradient Free Approaches to Update Weights in Deep Learning
Published:
Gradient Descent for Non-Differentiable Functions
- Smooth Approximation
 - E-subgradient method
 - Cutting plane method
 - Subgradient method
 - EA
 - Conjugate gradient method
 - Hessian free optimization method
 - Quasi-Newton method
 - GA
 - BFGS
 - Simulated Annealing
 
Gradient-Free Optimization Algorithms:
- Bayesian optimization
 - Coordinate descent and adaptive coordinate descent
 - Cuckoo search
 - Beetle Antennae Search (BAS)
 - DONE
 - Evolution strategies, Natural evolution strategies (CMA-ES, xNES, SNES)
 - Genetic algorithms
 - MCS algorithm
 - Nelder-Mead method
 - Particle swarm optimization
 - Pattern search
 - Random search (including Luus–Jaakola)
 - Simulated annealing
 - Stochastic optimization
 - Subgradient method
 
Further Reading
Difference Target Propagation https://arxiv.org/pdf/1412.7525.pdf
- The HSIC Bottleneck (Hilbert-Schmidt Independence Criterion) https://arxiv.org/pdf/1908.01580v1.pdf
 - Online Alternating Minimization with Auxiliary Variables https://arxiv.org/pdf/1806.09077.pdf
 - Decoupled Neural Interfaces Using Synthetic Gradients https://arxiv.org/pdf/1608.05343.pdf
 - Accelerated Stochastic Gradient-free and Projection-free Methods
 - On Correctness of Automatic Differentiation for Non-Differentiable Functions
 - Direct Feedback Alignment Provides Learning in Deep Neural Networks
 - Cubature Kalman filtering for training deep neural networks
 
Other : Nelder-Mead simplex algorithm, the Powell’s method, and the Hooke-Jeeves method.
