Part 5
1 min read3 headingsSplit lesson page
Lesson overview | Previous part | Lesson overview
Gradient Descent: Part 12: Conceptual Bridge to References
12. Conceptual Bridge
Gradient Descent sits inside a chain. Earlier sections give the calculus, probability, and linear algebra needed to write the objective and interpret the update. Later sections use this material to reason about noisy gradients, adaptive state, regularization, tuning, schedules, and finally information-theoretic losses.
Backward link: Convex Optimization supplies the immediate prerequisite vocabulary.
Forward link: Second-Order Methods uses this section as a building block.
+------------------------------------------------------------+
| Chapter 8: Optimization |
| 01-Convex-Optimization Convex Optimization |
| >> 02-Gradient-Descent Gradient Descent |
| 03-Second-Order-Methods Second-Order Methods |
| 04-Constrained-Optimization Constrained Optimization |
| 05-Stochastic-Optimization Stochastic Optimization |
| 06-Optimization-Landscape Optimization Landscape |
| 07-Adaptive-Learning-Rate Adaptive Learning Rate |
| 08-Regularization-Methods Regularization Methods |
| 09-Hyperparameter-Optimization Hyperparameter Optimization |
| 10-Learning-Rate-Schedules Learning Rate Schedules |
+------------------------------------------------------------+
Appendix A. Extended Derivation and Diagnostic Cards
References
- Nocedal and Wright, Numerical Optimization.
- Bertsekas, Nonlinear Programming.
- Polyak, Introduction to Optimization.
- Nesterov, A Method for Solving the Convex Programming Problem.
- Goodfellow, Bengio, and Courville, Deep Learning.
- Bottou, Curtis, and Nocedal, Optimization Methods for Large-Scale Machine Learning.
- PyTorch optimizer and scheduler documentation.
- Optax documentation for composable optimizer transformations.