Part 3Math for LLMs

Optimality Conditions: Part 3 - Supplement L Quick Reference Tables

Multivariate Calculus / Optimality Conditions

Private notes
0/8000

Notes stay private to your browser until account sync is configured.

Part 3
2 min read4 headingsSplit lesson page

Lesson overview | Previous part | Lesson overview

Optimality Conditions: Supplement L: Quick Reference Tables

Supplement L: Quick Reference Tables

Lagrange vs KKT: Choosing the Right Tool

SituationMethodKey Condition
No constraintsFirst/Second orderf=0\nabla f = 0, check HH
Equality constraints onlyLagrange multipliersf+λg=0\nabla f + \lambda^\top \nabla g = 0
Inequality constraintsKKT4 conditions including μ0\mu \geq 0, CS
Mixed equality + inequalityKKTFull 4-condition system
Convex problemKKT (global)KKT \Leftrightarrow global min (Slater holds)
Non-convex problemKKT (local)KKT \Rightarrow local min only
LP/QPInterior point or simplexKKT system solved directly
Non-smooth ffSubdifferential0f+h0 \in \partial f + \partial h

Shadow Price Interpretation Guide

ContextMultiplier meaning
Budget constraint 1wB\mathbf{1}^\top\mathbf{w} \leq Bλ\lambda^* = value of one more unit of budget
Norm constraint w2c\|\mathbf{w}\|^2 \leq cλ\lambda^* = decrease in loss per unit norm increase
SVM margin y(wx+b)1y(\mathbf{w}^\top\mathbf{x}+b) \geq 1αi\alpha_i^* = importance of sample ii to the decision boundary
KL constraint DKLδD_{\text{KL}} \leq \delta in RLHFβ\beta^* = reward per unit KL allowed (the "temperature")
Power budget pi=P\sum p_i = P (water-filling)λ\lambda^* = value of one more unit of transmit power
Expected return μw=r\mu^\top\mathbf{w} = r (portfolio)λ\lambda^* = variance cost per unit extra expected return

Convexity Verification Checklist

CheckMethodOutcome
f:RRf: \mathbb{R} \to \mathbb{R}f(x)0f''(x) \geq 0 everywhereConvex iff true
f:RnRf: \mathbb{R}^n \to \mathbb{R}, C2C^22f(x)0\nabla^2 f(\mathbf{x}) \succeq 0 everywhereConvex iff true
ff is a sumEach term convex?Sum is convex
f=gAf = g \circ A (affine precompose)gg convex?ff convex
f(x)=maxigi(x)f(\mathbf{x}) = \max_i g_i(\mathbf{x})Each gig_i convex?ff convex
f(x)=logiegi(x)f(\mathbf{x}) = \log \sum_i e^{g_i(\mathbf{x})}Each gig_i affine?ff convex (log-sum-exp)
f(x)=Axb2f(\mathbf{x}) = \|A\mathbf{x}-\mathbf{b}\|^2AlwaysConvex (quadratic, H=2AA0H = 2A^\top A \succeq 0)

Skill Check

Test this lesson

Answer 4 quick questions to lock in the lesson and feed your adaptive practice queue.

--
Score
0/4
Answered
Not attempted
Status
1

Which module does this lesson belong to?

2

Which section is covered in this lesson content?

3

Which term is most central to this lesson?

4

What is the best way to use this lesson for real learning?

Your answers save locally first, then sync when account storage is available.
Practice queue