Gradient Free Approaches to Update Weights in Deep Learning
Published:
Gradient Descent for Non-Differentiable Functions
- Smooth Approximation
- E-subgradient method
- Cutting plane method
- Subgradient method
- EA
- Conjugate gradient method
- Hessian free optimization method
- Quasi-Newton method
- GA
- BFGS
- Simulated Annealing
Gradient-Free Optimization Algorithms:
- Bayesian optimization
- Coordinate descent and adaptive coordinate descent
- Cuckoo search
- Beetle Antennae Search (BAS)
- DONE
- Evolution strategies, Natural evolution strategies (CMA-ES, xNES, SNES)
- Genetic algorithms
- MCS algorithm
- Nelder-Mead method
- Particle swarm optimization
- Pattern search
- Random search (including Luus–Jaakola)
- Simulated annealing
- Stochastic optimization
- Subgradient method
Further Reading
Difference Target Propagation https://arxiv.org/pdf/1412.7525.pdf
- The HSIC Bottleneck (Hilbert-Schmidt Independence Criterion) https://arxiv.org/pdf/1908.01580v1.pdf
- Online Alternating Minimization with Auxiliary Variables https://arxiv.org/pdf/1806.09077.pdf
- Decoupled Neural Interfaces Using Synthetic Gradients https://arxiv.org/pdf/1608.05343.pdf
- Accelerated Stochastic Gradient-free and Projection-free Methods
- On Correctness of Automatic Differentiation for Non-Differentiable Functions
- Direct Feedback Alignment Provides Learning in Deep Neural Networks
- Cubature Kalman filtering for training deep neural networks
Other : Nelder-Mead simplex algorithm, the Powell’s method, and the Hooke-Jeeves method.