Private notes
0/8000

Notes stay private to your browser until account sync is configured.

Part 2
30 min read12 headingsSplit lesson page

Lesson overview | Previous part | Next part

Minimax Theorem: Part 3: Matrix Games to 4. Minimax Theorem

3. Matrix Games

Matrix Games develops the part of minimax theorem specified by the approved Chapter 23 table of contents. The treatment is game-theoretic, not merely an optimization recipe.

3.1 pure saddle points

Pure saddle points belongs to the canonical scope of Minimax Theorem. The central object is not a single optimizer but a system of decision makers whose objectives interact.

For this subsection, the working scope is zero-sum matrix games, maximin and minimax values, saddle points, LP duality, no-regret approximation, and robust AI objectives. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.

maxpΔmminj(Ap)j=minqΔnmaxi(Aq)i.\max_{\mathbf{p}\in\Delta_m}\min_j (A^\top\mathbf{p})_j = \min_{\mathbf{q}\in\Delta_n}\max_i (A\mathbf{q})_i.

The formula gives the mathematical handle for pure saddle points. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game objectMeaningAI interpretation
PlayerDecision maker with an objectiveModel, user, attacker, defender, generator, evaluator, tool-using agent
ActionChoice available to a playerPrompt, route, attack, defense, bid, policy update, generated sample
StrategyRule or distribution over actionsStochastic policy, decoding policy, defense randomization, routing policy
PayoffUtility or negative lossAccuracy, reward, cost, safety score, exploitability, compute budget
EquilibriumStable joint behaviorNo agent can improve by changing alone under the stated game

Operational definition.

Minimax reasoning chooses a strategy by its guaranteed performance against the strongest opponent response in a zero-sum game.

Worked reading.

The row player computes maxpminqpAq\max_p\min_q p^\top A q while the column player computes minqmaxppAq\min_q\max_p p^\top A q. The minimax theorem says these values agree for finite zero-sum games.

Three examples of pure saddle points:

  1. Robust classification against bounded perturbations.
  2. A discriminator maximizing the generator's loss in a simplified GAN objective.
  3. Worst-case evaluation where the tester chooses the hardest valid case.

Two non-examples clarify the boundary:

  1. Average validation loss over a fixed dataset.
  2. General-sum bargaining where both players can gain together.

Proof or verification habit for pure saddle points:

The LP-duality proof writes each player's guarantee as a linear program; strong duality equates the two optimal values.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, pure saddle points is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Minimax is the mathematical backbone of adversarial robustness, but only relative to the stated threat model.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using pure saddle points responsibly:

  • State the players and their objectives.
  • State the action spaces and information structure.
  • Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
  • Identify pure, mixed, or policy strategies.
  • Compute best responses or exploitability before claiming stability.
  • Separate equilibrium analysis from welfare analysis.
  • Explain what changes if opponents adapt.

Local diagnostic: Check zero-sum structure before importing minimax conclusions.

This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Pure saddle points gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic questionGame-theoretic discipline it tests
Who can respond?Player modeling
What can they change?Action space
What do they want?Payoff design
Can one side commit first?Stackelberg structure
Is the worst case important?Minimax or robust objective

3.2 mixed strategies

Mixed strategies belongs to the canonical scope of Minimax Theorem. The central object is not a single optimizer but a system of decision makers whose objectives interact.

For this subsection, the working scope is zero-sum matrix games, maximin and minimax values, saddle points, LP duality, no-regret approximation, and robust AI objectives. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.

minθmaxδSL(fθ(x+δ),y).\min_\theta \max_{\boldsymbol{\delta}\in\mathcal{S}} \mathcal{L}(f_\theta(\mathbf{x}+\boldsymbol{\delta}),y).

The formula gives the mathematical handle for mixed strategies. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game objectMeaningAI interpretation
PlayerDecision maker with an objectiveModel, user, attacker, defender, generator, evaluator, tool-using agent
ActionChoice available to a playerPrompt, route, attack, defense, bid, policy update, generated sample
StrategyRule or distribution over actionsStochastic policy, decoding policy, defense randomization, routing policy
PayoffUtility or negative lossAccuracy, reward, cost, safety score, exploitability, compute budget
EquilibriumStable joint behaviorNo agent can improve by changing alone under the stated game

Operational definition.

A mixed strategy is a probability distribution over actions. In equilibrium, actions used with positive probability must usually give the same expected payoff; otherwise probability can move to the better action.

Worked reading.

In matching pennies, the row player is indifferent only when the column player randomizes heads and tails equally. The same calculation makes the column player indifferent, giving the 1/2,1/21/2,1/2 equilibrium.

Three examples of mixed strategies:

  1. Randomized audits that make attackers uncertain.
  2. Stochastic decoding policies that prevent deterministic exploitation.
  3. Exploration policies in self-play where pure repetition would be exploited.

Two non-examples clarify the boundary:

  1. Adding noise after choosing a deterministic losing action.
  2. A distribution that assigns probability to an action with strictly lower payoff while another supported action is better.

Proof or verification habit for mixed strategies:

Set expected payoffs of supported actions equal, solve for probabilities, then verify unsupported actions are not better.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, mixed strategies is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Mixed strategies explain why robust systems often randomize: predictability can be a vulnerability when opponents adapt.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using mixed strategies responsibly:

  • State the players and their objectives.
  • State the action spaces and information structure.
  • Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
  • Identify pure, mixed, or policy strategies.
  • Compute best responses or exploitability before claiming stability.
  • Separate equilibrium analysis from welfare analysis.
  • Explain what changes if opponents adapt.

Local diagnostic: Check both support equality and off-support inequalities.

This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Mixed strategies gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic questionGame-theoretic discipline it tests
Who can respond?Player modeling
What can they change?Action space
What do they want?Payoff design
Can one side commit first?Stackelberg structure
Is the worst case important?Minimax or robust objective

3.3 row player LP

Row player lp belongs to the canonical scope of Minimax Theorem. The central object is not a single optimizer but a system of decision makers whose objectives interact.

For this subsection, the working scope is zero-sum matrix games, maximin and minimax values, saddle points, LP duality, no-regret approximation, and robust AI objectives. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.

v=maxpΔmminqΔnpAq.v^- = \max_{\mathbf{p}\in\Delta_m}\min_{\mathbf{q}\in\Delta_n}\mathbf{p}^\top A\mathbf{q}.

The formula gives the mathematical handle for row player lp. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game objectMeaningAI interpretation
PlayerDecision maker with an objectiveModel, user, attacker, defender, generator, evaluator, tool-using agent
ActionChoice available to a playerPrompt, route, attack, defense, bid, policy update, generated sample
StrategyRule or distribution over actionsStochastic policy, decoding policy, defense randomization, routing policy
PayoffUtility or negative lossAccuracy, reward, cost, safety score, exploitability, compute budget
EquilibriumStable joint behaviorNo agent can improve by changing alone under the stated game

Operational definition.

Finite zero-sum games can be solved as primal-dual linear programs over probability simplices and value variables.

Worked reading.

The row player maximizes vv subject to every column giving expected payoff at least vv. The column player minimizes ww subject to every row giving expected payoff at most ww.

Three examples of row player lp:

  1. Solving rock-paper-scissors as an LP.
  2. Computing an adversarial test mixture with linear constraints.
  3. Finding a randomized defense allocation under budget constraints.

Two non-examples clarify the boundary:

  1. Using an unconstrained optimizer that ignores probabilities summing to one.
  2. Applying LP duality to a nonlinear continuous attack without convexity assumptions.

Proof or verification habit for row player lp:

The dual variables are the opponent's mixed strategy; complementary slackness explains why supported actions tie in payoff.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, row player lp is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

LP form makes game values auditable: constraints correspond directly to opponent deviations.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using row player lp responsibly:

  • State the players and their objectives.
  • State the action spaces and information structure.
  • Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
  • Identify pure, mixed, or policy strategies.
  • Compute best responses or exploitability before claiming stability.
  • Separate equilibrium analysis from welfare analysis.
  • Explain what changes if opponents adapt.

Local diagnostic: Inspect primal feasibility, dual feasibility, and matching objective values.

This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Row player lp gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic questionGame-theoretic discipline it tests
Who can respond?Player modeling
What can they change?Action space
What do they want?Payoff design
Can one side commit first?Stackelberg structure
Is the worst case important?Minimax or robust objective

3.4 column player dual

Column player dual belongs to the canonical scope of Minimax Theorem. The central object is not a single optimizer but a system of decision makers whose objectives interact.

For this subsection, the working scope is zero-sum matrix games, maximin and minimax values, saddle points, LP duality, no-regret approximation, and robust AI objectives. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.

v+=minqΔnmaxpΔmpAq.v^+ = \min_{\mathbf{q}\in\Delta_n}\max_{\mathbf{p}\in\Delta_m}\mathbf{p}^\top A\mathbf{q}.

The formula gives the mathematical handle for column player dual. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game objectMeaningAI interpretation
PlayerDecision maker with an objectiveModel, user, attacker, defender, generator, evaluator, tool-using agent
ActionChoice available to a playerPrompt, route, attack, defense, bid, policy update, generated sample
StrategyRule or distribution over actionsStochastic policy, decoding policy, defense randomization, routing policy
PayoffUtility or negative lossAccuracy, reward, cost, safety score, exploitability, compute budget
EquilibriumStable joint behaviorNo agent can improve by changing alone under the stated game

Operational definition.

The game value is the payoff both sides can guarantee in a finite zero-sum game when they use optimal mixed strategies.

Worked reading.

The row player's guarantee is the lower value; the column player's guarantee is the upper value. The minimax theorem states these agree in finite zero-sum games.

Three examples of column player dual:

  1. The value of rock-paper-scissors is zero.
  2. A robust classifier's worst-case loss is the value of a specified attack game.
  3. Yao-style lower bounds swap randomized algorithms and input distributions under minimax reasoning.

Two non-examples clarify the boundary:

  1. A general-sum welfare score.
  2. A last-iterate training reward.

Proof or verification habit for column player dual:

The finite proof can be read through LP duality: the column player's dual program certifies the same scalar value as the row player's primal program.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, column player dual is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

The value is useful because it is a guarantee, not a hope about average-case behavior.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using column player dual responsibly:

  • State the players and their objectives.
  • State the action spaces and information structure.
  • Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
  • Identify pure, mixed, or policy strategies.
  • Compute best responses or exploitability before claiming stability.
  • Separate equilibrium analysis from welfare analysis.
  • Explain what changes if opponents adapt.

Local diagnostic: Check the assumptions: finite action sets or the right compactness, convexity, and continuity conditions for extensions.

This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Column player dual gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic questionGame-theoretic discipline it tests
Who can respond?Player modeling
What can they change?Action space
What do they want?Payoff design
Can one side commit first?Stackelberg structure
Is the worst case important?Minimax or robust objective

3.5 rock-paper-scissors

Rock-paper-scissors belongs to the canonical scope of Minimax Theorem. The central object is not a single optimizer but a system of decision makers whose objectives interact.

For this subsection, the working scope is zero-sum matrix games, maximin and minimax values, saddle points, LP duality, no-regret approximation, and robust AI objectives. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.

maxpΔmminj(Ap)j=minqΔnmaxi(Aq)i.\max_{\mathbf{p}\in\Delta_m}\min_j (A^\top\mathbf{p})_j = \min_{\mathbf{q}\in\Delta_n}\max_i (A\mathbf{q})_i.

The formula gives the mathematical handle for rock-paper-scissors. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game objectMeaningAI interpretation
PlayerDecision maker with an objectiveModel, user, attacker, defender, generator, evaluator, tool-using agent
ActionChoice available to a playerPrompt, route, attack, defense, bid, policy update, generated sample
StrategyRule or distribution over actionsStochastic policy, decoding policy, defense randomization, routing policy
PayoffUtility or negative lossAccuracy, reward, cost, safety score, exploitability, compute budget
EquilibriumStable joint behaviorNo agent can improve by changing alone under the stated game

Operational definition.

Rock-paper-scissors is the canonical finite zero-sum game with no pure saddle point and a uniform mixed equilibrium.

Worked reading.

Each pure action is beaten by another pure action, so deterministic play is exploitable. Uniform mixing makes the opponent indifferent among all pure actions.

Three examples of rock-paper-scissors:

  1. A toy model of cyclic best responses.
  2. A clean demonstration of exploitability.
  3. A notebook benchmark for fictitious play or no-regret updates.

Two non-examples clarify the boundary:

  1. A coordination game.
  2. A game with a dominant strategy.

Proof or verification habit for rock-paper-scissors:

Set the expected payoff of rock, paper, and scissors equal; the only symmetric solution is the uniform distribution.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, rock-paper-scissors is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

RPS is small, but its cycling behavior mirrors larger adversarial training dynamics.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using rock-paper-scissors responsibly:

  • State the players and their objectives.
  • State the action spaces and information structure.
  • Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
  • Identify pure, mixed, or policy strategies.
  • Compute best responses or exploitability before claiming stability.
  • Separate equilibrium analysis from welfare analysis.
  • Explain what changes if opponents adapt.

Local diagnostic: If a learner puts too much mass on one action, compute the opponent's exploiting action.

This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Rock-paper-scissors gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic questionGame-theoretic discipline it tests
Who can respond?Player modeling
What can they change?Action space
What do they want?Payoff design
Can one side commit first?Stackelberg structure
Is the worst case important?Minimax or robust objective

4. Minimax Theorem

Minimax Theorem develops the part of minimax theorem specified by the approved Chapter 23 table of contents. The treatment is game-theoretic, not merely an optimization recipe.

4.1 theorem statement

Theorem statement belongs to the canonical scope of Minimax Theorem. The central object is not a single optimizer but a system of decision makers whose objectives interact.

For this subsection, the working scope is zero-sum matrix games, maximin and minimax values, saddle points, LP duality, no-regret approximation, and robust AI objectives. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.

minθmaxδSL(fθ(x+δ),y).\min_\theta \max_{\boldsymbol{\delta}\in\mathcal{S}} \mathcal{L}(f_\theta(\mathbf{x}+\boldsymbol{\delta}),y).

The formula gives the mathematical handle for theorem statement. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game objectMeaningAI interpretation
PlayerDecision maker with an objectiveModel, user, attacker, defender, generator, evaluator, tool-using agent
ActionChoice available to a playerPrompt, route, attack, defense, bid, policy update, generated sample
StrategyRule or distribution over actionsStochastic policy, decoding policy, defense randomization, routing policy
PayoffUtility or negative lossAccuracy, reward, cost, safety score, exploitability, compute budget
EquilibriumStable joint behaviorNo agent can improve by changing alone under the stated game

Operational definition.

The game value is the payoff both sides can guarantee in a finite zero-sum game when they use optimal mixed strategies.

Worked reading.

The row player's guarantee is the lower value; the column player's guarantee is the upper value. The minimax theorem states these agree in finite zero-sum games.

Three examples of theorem statement:

  1. The value of rock-paper-scissors is zero.
  2. A robust classifier's worst-case loss is the value of a specified attack game.
  3. Yao-style lower bounds swap randomized algorithms and input distributions under minimax reasoning.

Two non-examples clarify the boundary:

  1. A general-sum welfare score.
  2. A last-iterate training reward.

Proof or verification habit for theorem statement:

The finite proof can be read through LP duality: the column player's dual program certifies the same scalar value as the row player's primal program.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, theorem statement is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

The value is useful because it is a guarantee, not a hope about average-case behavior.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using theorem statement responsibly:

  • State the players and their objectives.
  • State the action spaces and information structure.
  • Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
  • Identify pure, mixed, or policy strategies.
  • Compute best responses or exploitability before claiming stability.
  • Separate equilibrium analysis from welfare analysis.
  • Explain what changes if opponents adapt.

Local diagnostic: Check the assumptions: finite action sets or the right compactness, convexity, and continuity conditions for extensions.

This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Theorem statement gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic questionGame-theoretic discipline it tests
Who can respond?Player modeling
What can they change?Action space
What do they want?Payoff design
Can one side commit first?Stackelberg structure
Is the worst case important?Minimax or robust objective

4.2 geometric simplex intuition

Geometric simplex intuition belongs to the canonical scope of Minimax Theorem. The central object is not a single optimizer but a system of decision makers whose objectives interact.

For this subsection, the working scope is zero-sum matrix games, maximin and minimax values, saddle points, LP duality, no-regret approximation, and robust AI objectives. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.

v=maxpΔmminqΔnpAq.v^- = \max_{\mathbf{p}\in\Delta_m}\min_{\mathbf{q}\in\Delta_n}\mathbf{p}^\top A\mathbf{q}.

The formula gives the mathematical handle for geometric simplex intuition. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game objectMeaningAI interpretation
PlayerDecision maker with an objectiveModel, user, attacker, defender, generator, evaluator, tool-using agent
ActionChoice available to a playerPrompt, route, attack, defense, bid, policy update, generated sample
StrategyRule or distribution over actionsStochastic policy, decoding policy, defense randomization, routing policy
PayoffUtility or negative lossAccuracy, reward, cost, safety score, exploitability, compute budget
EquilibriumStable joint behaviorNo agent can improve by changing alone under the stated game

Operational definition.

A mixed strategy is a probability distribution over actions. In equilibrium, actions used with positive probability must usually give the same expected payoff; otherwise probability can move to the better action.

Worked reading.

In matching pennies, the row player is indifferent only when the column player randomizes heads and tails equally. The same calculation makes the column player indifferent, giving the 1/2,1/21/2,1/2 equilibrium.

Three examples of geometric simplex intuition:

  1. Randomized audits that make attackers uncertain.
  2. Stochastic decoding policies that prevent deterministic exploitation.
  3. Exploration policies in self-play where pure repetition would be exploited.

Two non-examples clarify the boundary:

  1. Adding noise after choosing a deterministic losing action.
  2. A distribution that assigns probability to an action with strictly lower payoff while another supported action is better.

Proof or verification habit for geometric simplex intuition:

Set expected payoffs of supported actions equal, solve for probabilities, then verify unsupported actions are not better.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, geometric simplex intuition is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Mixed strategies explain why robust systems often randomize: predictability can be a vulnerability when opponents adapt.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using geometric simplex intuition responsibly:

  • State the players and their objectives.
  • State the action spaces and information structure.
  • Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
  • Identify pure, mixed, or policy strategies.
  • Compute best responses or exploitability before claiming stability.
  • Separate equilibrium analysis from welfare analysis.
  • Explain what changes if opponents adapt.

Local diagnostic: Check both support equality and off-support inequalities.

This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Geometric simplex intuition gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic questionGame-theoretic discipline it tests
Who can respond?Player modeling
What can they change?Action space
What do they want?Payoff design
Can one side commit first?Stackelberg structure
Is the worst case important?Minimax or robust objective

4.3 LP duality proof sketch

Lp duality proof sketch belongs to the canonical scope of Minimax Theorem. The central object is not a single optimizer but a system of decision makers whose objectives interact.

For this subsection, the working scope is zero-sum matrix games, maximin and minimax values, saddle points, LP duality, no-regret approximation, and robust AI objectives. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.

v+=minqΔnmaxpΔmpAq.v^+ = \min_{\mathbf{q}\in\Delta_n}\max_{\mathbf{p}\in\Delta_m}\mathbf{p}^\top A\mathbf{q}.

The formula gives the mathematical handle for lp duality proof sketch. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game objectMeaningAI interpretation
PlayerDecision maker with an objectiveModel, user, attacker, defender, generator, evaluator, tool-using agent
ActionChoice available to a playerPrompt, route, attack, defense, bid, policy update, generated sample
StrategyRule or distribution over actionsStochastic policy, decoding policy, defense randomization, routing policy
PayoffUtility or negative lossAccuracy, reward, cost, safety score, exploitability, compute budget
EquilibriumStable joint behaviorNo agent can improve by changing alone under the stated game

Operational definition.

Finite zero-sum games can be solved as primal-dual linear programs over probability simplices and value variables.

Worked reading.

The row player maximizes vv subject to every column giving expected payoff at least vv. The column player minimizes ww subject to every row giving expected payoff at most ww.

Three examples of lp duality proof sketch:

  1. Solving rock-paper-scissors as an LP.
  2. Computing an adversarial test mixture with linear constraints.
  3. Finding a randomized defense allocation under budget constraints.

Two non-examples clarify the boundary:

  1. Using an unconstrained optimizer that ignores probabilities summing to one.
  2. Applying LP duality to a nonlinear continuous attack without convexity assumptions.

Proof or verification habit for lp duality proof sketch:

The dual variables are the opponent's mixed strategy; complementary slackness explains why supported actions tie in payoff.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, lp duality proof sketch is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

LP form makes game values auditable: constraints correspond directly to opponent deviations.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using lp duality proof sketch responsibly:

  • State the players and their objectives.
  • State the action spaces and information structure.
  • Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
  • Identify pure, mixed, or policy strategies.
  • Compute best responses or exploitability before claiming stability.
  • Separate equilibrium analysis from welfare analysis.
  • Explain what changes if opponents adapt.

Local diagnostic: Inspect primal feasibility, dual feasibility, and matching objective values.

This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Lp duality proof sketch gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic questionGame-theoretic discipline it tests
Who can respond?Player modeling
What can they change?Action space
What do they want?Payoff design
Can one side commit first?Stackelberg structure
Is the worst case important?Minimax or robust objective

4.4 relation to Nash equilibrium

Relation to nash equilibrium belongs to the canonical scope of Minimax Theorem. The central object is not a single optimizer but a system of decision makers whose objectives interact.

For this subsection, the working scope is zero-sum matrix games, maximin and minimax values, saddle points, LP duality, no-regret approximation, and robust AI objectives. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.

maxpΔmminj(Ap)j=minqΔnmaxi(Aq)i.\max_{\mathbf{p}\in\Delta_m}\min_j (A^\top\mathbf{p})_j = \min_{\mathbf{q}\in\Delta_n}\max_i (A\mathbf{q})_i.

The formula gives the mathematical handle for relation to nash equilibrium. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game objectMeaningAI interpretation
PlayerDecision maker with an objectiveModel, user, attacker, defender, generator, evaluator, tool-using agent
ActionChoice available to a playerPrompt, route, attack, defense, bid, policy update, generated sample
StrategyRule or distribution over actionsStochastic policy, decoding policy, defense randomization, routing policy
PayoffUtility or negative lossAccuracy, reward, cost, safety score, exploitability, compute budget
EquilibriumStable joint behaviorNo agent can improve by changing alone under the stated game

Operational definition.

A Nash equilibrium is a profile of strategies where no player can improve by changing its own strategy while all other strategies remain fixed.

Worked reading.

In the prisoner's dilemma payoff convention, mutual defection can be a Nash equilibrium even when mutual cooperation is better for both players. This is the central warning: stability and desirability are different properties.

Three examples of relation to nash equilibrium:

  1. A self-play policy pair where neither side has a profitable unilateral exploit.
  2. A GAN fixed point where the generator distribution matches data and the discriminator cannot improve classification.
  3. A routing market where no model provider benefits from changing only its bid.

Two non-examples clarify the boundary:

  1. A high-welfare outcome with a profitable unilateral deviation.
  2. A training checkpoint with low loss but a large best-response exploit.

Proof or verification habit for relation to nash equilibrium:

The proof is a universal deviation check: for each player ii, hold πi\pi_{-i} fixed and show ui(πi,πi)ui(πi,πi)u_i(\pi_i^*,\pi_{-i}^*)\ge u_i(\pi_i,\pi_{-i}^*) for all allowed πi\pi_i.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, relation to nash equilibrium is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

For AI agents, Nash is a stability diagnostic. It does not guarantee safety, alignment, fairness, or global efficiency unless those objectives are encoded in the game.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using relation to nash equilibrium responsibly:

  • State the players and their objectives.
  • State the action spaces and information structure.
  • Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
  • Identify pure, mixed, or policy strategies.
  • Compute best responses or exploitability before claiming stability.
  • Separate equilibrium analysis from welfare analysis.
  • Explain what changes if opponents adapt.

Local diagnostic: Ask: if one deployed model, user, or attacker changed behavior alone, would it gain?

This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Relation to nash equilibrium gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic questionGame-theoretic discipline it tests
Who can respond?Player modeling
What can they change?Action space
What do they want?Payoff design
Can one side commit first?Stackelberg structure
Is the worst case important?Minimax or robust objective

4.5 finite vs continuous games preview

Finite vs continuous games preview belongs to the canonical scope of Minimax Theorem. The central object is not a single optimizer but a system of decision makers whose objectives interact.

For this subsection, the working scope is zero-sum matrix games, maximin and minimax values, saddle points, LP duality, no-regret approximation, and robust AI objectives. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.

minθmaxδSL(fθ(x+δ),y).\min_\theta \max_{\boldsymbol{\delta}\in\mathcal{S}} \mathcal{L}(f_\theta(\mathbf{x}+\boldsymbol{\delta}),y).

The formula gives the mathematical handle for finite vs continuous games preview. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game objectMeaningAI interpretation
PlayerDecision maker with an objectiveModel, user, attacker, defender, generator, evaluator, tool-using agent
ActionChoice available to a playerPrompt, route, attack, defense, bid, policy update, generated sample
StrategyRule or distribution over actionsStochastic policy, decoding policy, defense randomization, routing policy
PayoffUtility or negative lossAccuracy, reward, cost, safety score, exploitability, compute budget
EquilibriumStable joint behaviorNo agent can improve by changing alone under the stated game

Operational definition.

The game value is the payoff both sides can guarantee in a finite zero-sum game when they use optimal mixed strategies.

Worked reading.

The row player's guarantee is the lower value; the column player's guarantee is the upper value. The minimax theorem states these agree in finite zero-sum games.

Three examples of finite vs continuous games preview:

  1. The value of rock-paper-scissors is zero.
  2. A robust classifier's worst-case loss is the value of a specified attack game.
  3. Yao-style lower bounds swap randomized algorithms and input distributions under minimax reasoning.

Two non-examples clarify the boundary:

  1. A general-sum welfare score.
  2. A last-iterate training reward.

Proof or verification habit for finite vs continuous games preview:

The finite proof can be read through LP duality: the column player's dual program certifies the same scalar value as the row player's primal program.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, finite vs continuous games preview is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

The value is useful because it is a guarantee, not a hope about average-case behavior.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using finite vs continuous games preview responsibly:

  • State the players and their objectives.
  • State the action spaces and information structure.
  • Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
  • Identify pure, mixed, or policy strategies.
  • Compute best responses or exploitability before claiming stability.
  • Separate equilibrium analysis from welfare analysis.
  • Explain what changes if opponents adapt.

Local diagnostic: Check the assumptions: finite action sets or the right compactness, convexity, and continuity conditions for extensions.

This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Finite vs continuous games preview gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic questionGame-theoretic discipline it tests
Who can respond?Player modeling
What can they change?Action space
What do they want?Payoff design
Can one side commit first?Stackelberg structure
Is the worst case important?Minimax or robust objective

Skill Check

Test this lesson

Answer 4 quick questions to lock in the lesson and feed your adaptive practice queue.

--
Score
0/4
Answered
Not attempted
Status
1

Which module does this lesson belong to?

2

Which section is covered in this lesson content?

3

Which term is most central to this lesson?

4

What is the best way to use this lesson for real learning?

Your answers save locally first, then sync when account storage is available.
Practice queue