Lesson overview | Previous part | Next part
Nash Equilibria: Part 5: Existence and Computation to 6. AI Applications
5. Existence and Computation
Existence and Computation develops the part of nash equilibria specified by the approved Chapter 23 table of contents. The treatment is game-theoretic, not merely an optimization recipe.
5.1 Nash existence theorem
Nash existence theorem belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is normal-form games, pure and mixed strategies, best responses, Nash equilibria, existence, computation, and AI equilibrium failures. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for nash existence theorem. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
A Nash equilibrium is a profile of strategies where no player can improve by changing its own strategy while all other strategies remain fixed.
Worked reading.
In the prisoner's dilemma payoff convention, mutual defection can be a Nash equilibrium even when mutual cooperation is better for both players. This is the central warning: stability and desirability are different properties.
Three examples of nash existence theorem:
- A self-play policy pair where neither side has a profitable unilateral exploit.
- A GAN fixed point where the generator distribution matches data and the discriminator cannot improve classification.
- A routing market where no model provider benefits from changing only its bid.
Two non-examples clarify the boundary:
- A high-welfare outcome with a profitable unilateral deviation.
- A training checkpoint with low loss but a large best-response exploit.
Proof or verification habit for nash existence theorem:
The proof is a universal deviation check: for each player , hold fixed and show for all allowed .
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, nash existence theorem is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
For AI agents, Nash is a stability diagnostic. It does not guarantee safety, alignment, fairness, or global efficiency unless those objectives are encoded in the game.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using nash existence theorem responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: Ask: if one deployed model, user, or attacker changed behavior alone, would it gain?
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Nash existence theorem gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |
5.2 fixed-point intuition
Fixed-point intuition belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is normal-form games, pure and mixed strategies, best responses, Nash equilibria, existence, computation, and AI equilibrium failures. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for fixed-point intuition. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
Existence theorems show that under finite action sets and mixed strategies, at least one equilibrium exists even when no pure equilibrium exists.
Worked reading.
The best-response map sends a mixed profile to the set of best responses. Fixed-point theorems guarantee a profile that is consistent with its own best-response set.
Three examples of fixed-point intuition:
- Matching pennies has no pure equilibrium but has a mixed equilibrium.
- Rock-paper-scissors has the uniform mixed equilibrium.
- Finite routing games can have mixed equilibria even when deterministic routing cycles.
Two non-examples clarify the boundary:
- A convergence guarantee for gradient descent.
- A guarantee that equilibrium is unique.
Proof or verification habit for fixed-point intuition:
The proof idea is not to enumerate all games. It builds a continuous or set-valued response map over compact convex simplices and applies a fixed-point theorem.
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, fixed-point intuition is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
This is why equilibrium can be a mathematically well-defined target even when training dynamics are unstable.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using fixed-point intuition responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: Remember: existence is not computation, and computation is not deployment safety.
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Fixed-point intuition gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |
5.3 Lemke-Howson preview
Lemke-howson preview belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is normal-form games, pure and mixed strategies, best responses, Nash equilibria, existence, computation, and AI equilibrium failures. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for lemke-howson preview. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
Computing equilibria is an algorithmic problem with its own complexity, approximation, and representation issues.
Worked reading.
Support enumeration guesses which actions receive positive probability, solves indifference equations, and checks inequalities. Lemke-Howson gives a pivoting method for two-player games. Correlated equilibrium enlarges the solution concept using signals.
Three examples of lemke-howson preview:
- A small two-player game solved by support enumeration.
- A traffic-routing game where correlated signals improve coordination.
- A large multi-agent benchmark where exact Nash search is computationally unrealistic.
Two non-examples clarify the boundary:
- Assuming an equilibrium solver scales because the theorem says an equilibrium exists.
- Confusing a correlated equilibrium with independent mixed strategies.
Proof or verification habit for lemke-howson preview:
The proof obligation moves from math existence to certificate checking: payoffs, supports, probabilities, and deviation inequalities must all be validated.
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, lemke-howson preview is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
AI systems often use approximate or learned equilibria because exact game solving is too expensive at model scale.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using lemke-howson preview responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: Always report the approximation notion and residual deviation gain.
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Lemke-howson preview gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |
5.4 correlated equilibrium preview
Correlated equilibrium preview belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is normal-form games, pure and mixed strategies, best responses, Nash equilibria, existence, computation, and AI equilibrium failures. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for correlated equilibrium preview. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
A Nash equilibrium is a profile of strategies where no player can improve by changing its own strategy while all other strategies remain fixed.
Worked reading.
In the prisoner's dilemma payoff convention, mutual defection can be a Nash equilibrium even when mutual cooperation is better for both players. This is the central warning: stability and desirability are different properties.
Three examples of correlated equilibrium preview:
- A self-play policy pair where neither side has a profitable unilateral exploit.
- A GAN fixed point where the generator distribution matches data and the discriminator cannot improve classification.
- A routing market where no model provider benefits from changing only its bid.
Two non-examples clarify the boundary:
- A high-welfare outcome with a profitable unilateral deviation.
- A training checkpoint with low loss but a large best-response exploit.
Proof or verification habit for correlated equilibrium preview:
The proof is a universal deviation check: for each player , hold fixed and show for all allowed .
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, correlated equilibrium preview is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
For AI agents, Nash is a stability diagnostic. It does not guarantee safety, alignment, fairness, or global efficiency unless those objectives are encoded in the game.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using correlated equilibrium preview responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: Ask: if one deployed model, user, or attacker changed behavior alone, would it gain?
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Correlated equilibrium preview gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |
5.5 computational hardness
Computational hardness belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is normal-form games, pure and mixed strategies, best responses, Nash equilibria, existence, computation, and AI equilibrium failures. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for computational hardness. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
Computing equilibria is an algorithmic problem with its own complexity, approximation, and representation issues.
Worked reading.
Support enumeration guesses which actions receive positive probability, solves indifference equations, and checks inequalities. Lemke-Howson gives a pivoting method for two-player games. Correlated equilibrium enlarges the solution concept using signals.
Three examples of computational hardness:
- A small two-player game solved by support enumeration.
- A traffic-routing game where correlated signals improve coordination.
- A large multi-agent benchmark where exact Nash search is computationally unrealistic.
Two non-examples clarify the boundary:
- Assuming an equilibrium solver scales because the theorem says an equilibrium exists.
- Confusing a correlated equilibrium with independent mixed strategies.
Proof or verification habit for computational hardness:
The proof obligation moves from math existence to certificate checking: payoffs, supports, probabilities, and deviation inequalities must all be validated.
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, computational hardness is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
AI systems often use approximate or learned equilibria because exact game solving is too expensive at model scale.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using computational hardness responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: Always report the approximation notion and residual deviation gain.
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Computational hardness gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |
6. AI Applications
AI Applications develops the part of nash equilibria specified by the approved Chapter 23 table of contents. The treatment is game-theoretic, not merely an optimization recipe.
6.1 GAN training dynamics
Gan training dynamics belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is normal-form games, pure and mixed strategies, best responses, Nash equilibria, existence, computation, and AI equilibrium failures. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for gan training dynamics. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
Generative, evaluation, and deployment games arise when model behavior changes in response to the measurement or defense mechanism.
Worked reading.
In a GAN, the discriminator improves its classifier while the generator improves samples to fool it. In red-team evaluation, the attacker improves examples after seeing failures of the defense.
Three examples of gan training dynamics:
- GAN generator-discriminator training.
- Jailbreak discovery against a deployed policy layer.
- Benchmark gaming where systems optimize for the public metric instead of the intended task.
Two non-examples clarify the boundary:
- One-time evaluation on a frozen hidden test set.
- A content filter measured only against historical prompts.
Proof or verification habit for gan training dynamics:
The mathematical proof obligation is to identify the adaptive loop and the payoff each side optimizes.
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, gan training dynamics is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
Many LLM safety and evaluation failures are game failures: optimizing the metric changes the population of attempts.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using gan training dynamics responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: Ask who can observe the metric, adapt to it, and benefit from adaptation.
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Gan training dynamics gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |
6.2 self-play
Self-play belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is normal-form games, pure and mixed strategies, best responses, Nash equilibria, existence, computation, and AI equilibrium failures. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for self-play. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
Many-agent learning means every learner's policy is part of the environment seen by the others.
Worked reading.
If agent 1 changes its policy, agent 2's data distribution changes even when the physical simulator is unchanged. That is the core nonstationarity of multi-agent learning.
Three examples of self-play:
- Self-play agents improving by training against earlier or current versions.
- LLM tool agents changing each other's context and options.
- A routing marketplace where traffic shifts after one provider changes quality.
Two non-examples clarify the boundary:
- A single-agent RL problem with a fixed transition kernel.
- Batch supervised learning on immutable labels.
Proof or verification habit for self-play:
Analyze the joint policy trajectory, not only individual losses.
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, self-play is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
Agentic LLM systems make multi-agent math practical: prompts, tools, memory, and policies interact in a shared state.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using self-play responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: Ask which part of another agent's behavior enters this agent's observation or reward.
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Self-play gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |
6.3 model routing competition
Model routing competition belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is normal-form games, pure and mixed strategies, best responses, Nash equilibria, existence, computation, and AI equilibrium failures. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for model routing competition. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
Generative, evaluation, and deployment games arise when model behavior changes in response to the measurement or defense mechanism.
Worked reading.
In a GAN, the discriminator improves its classifier while the generator improves samples to fool it. In red-team evaluation, the attacker improves examples after seeing failures of the defense.
Three examples of model routing competition:
- GAN generator-discriminator training.
- Jailbreak discovery against a deployed policy layer.
- Benchmark gaming where systems optimize for the public metric instead of the intended task.
Two non-examples clarify the boundary:
- One-time evaluation on a frozen hidden test set.
- A content filter measured only against historical prompts.
Proof or verification habit for model routing competition:
The mathematical proof obligation is to identify the adaptive loop and the payoff each side optimizes.
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, model routing competition is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
Many LLM safety and evaluation failures are game failures: optimizing the metric changes the population of attempts.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using model routing competition responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: Ask who can observe the metric, adapt to it, and benefit from adaptation.
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Model routing competition gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |
6.4 tool-use equilibria
Tool-use equilibria belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is normal-form games, pure and mixed strategies, best responses, Nash equilibria, existence, computation, and AI equilibrium failures. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for tool-use equilibria. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
Generative, evaluation, and deployment games arise when model behavior changes in response to the measurement or defense mechanism.
Worked reading.
In a GAN, the discriminator improves its classifier while the generator improves samples to fool it. In red-team evaluation, the attacker improves examples after seeing failures of the defense.
Three examples of tool-use equilibria:
- GAN generator-discriminator training.
- Jailbreak discovery against a deployed policy layer.
- Benchmark gaming where systems optimize for the public metric instead of the intended task.
Two non-examples clarify the boundary:
- One-time evaluation on a frozen hidden test set.
- A content filter measured only against historical prompts.
Proof or verification habit for tool-use equilibria:
The mathematical proof obligation is to identify the adaptive loop and the payoff each side optimizes.
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, tool-use equilibria is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
Many LLM safety and evaluation failures are game failures: optimizing the metric changes the population of attempts.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using tool-use equilibria responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: Ask who can observe the metric, adapt to it, and benefit from adaptation.
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Tool-use equilibria gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |
6.5 equilibrium failure in nonstationary systems
Equilibrium failure in nonstationary systems belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is normal-form games, pure and mixed strategies, best responses, Nash equilibria, existence, computation, and AI equilibrium failures. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for equilibrium failure in nonstationary systems. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
A Nash equilibrium is a profile of strategies where no player can improve by changing its own strategy while all other strategies remain fixed.
Worked reading.
In the prisoner's dilemma payoff convention, mutual defection can be a Nash equilibrium even when mutual cooperation is better for both players. This is the central warning: stability and desirability are different properties.
Three examples of equilibrium failure in nonstationary systems:
- A self-play policy pair where neither side has a profitable unilateral exploit.
- A GAN fixed point where the generator distribution matches data and the discriminator cannot improve classification.
- A routing market where no model provider benefits from changing only its bid.
Two non-examples clarify the boundary:
- A high-welfare outcome with a profitable unilateral deviation.
- A training checkpoint with low loss but a large best-response exploit.
Proof or verification habit for equilibrium failure in nonstationary systems:
The proof is a universal deviation check: for each player , hold fixed and show for all allowed .
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, equilibrium failure in nonstationary systems is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
For AI agents, Nash is a stability diagnostic. It does not guarantee safety, alignment, fairness, or global efficiency unless those objectives are encoded in the game.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using equilibrium failure in nonstationary systems responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: Ask: if one deployed model, user, or attacker changed behavior alone, would it gain?
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Equilibrium failure in nonstationary systems gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |