Lesson overview | Lesson overview | Next part
Adversarial Game Theory: Part 1: Intuition
1. Intuition
Intuition develops the part of adversarial game theory specified by the approved Chapter 23 table of contents. The treatment is game-theoretic, not merely an optimization recipe.
1.1 attacker-defender thinking
Attacker-defender thinking belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is attacker-defender games, threat sets, robust optimization, Stackelberg security games, adversarial examples, and adaptive evaluation. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for attacker-defender thinking. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
Strategic interaction begins when another decision maker can react to the policy being studied. Stability means the modeled behavior remains defensible after that reaction is allowed.
Worked reading.
Start with one proposed joint behavior. Freeze everyone except one player, compute that player's best alternative, then repeat for every player. If a profitable switch exists, the behavior is not strategically stable.
Three examples of attacker-defender thinking:
- A guardrail remains effective after attackers see examples of blocked prompts.
- A model-routing policy remains attractive after providers update prices.
- A self-play policy cannot be easily exploited by a newly trained opponent.
Two non-examples clarify the boundary:
- A high average score on a fixed dataset.
- A local minimum of one model's loss with no opponent.
Proof or verification habit for attacker-defender thinking:
The verification habit is adversarial: search for profitable deviations rather than only confirming the proposed behavior works in the original scenario.
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, attacker-defender thinking is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
This is the mathematical shift from offline ML to deployed AI systems where users, competitors, and automated attacks learn from the model.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using attacker-defender thinking responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: State the adaptation channel: what can the other side observe, change, and optimize?
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Attacker-defender thinking gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |
1.2 threat models
Threat models belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is attacker-defender games, threat sets, robust optimization, Stackelberg security games, adversarial examples, and adaptive evaluation. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for threat models. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
A threat model defines the attacker's allowed moves. Robust optimization then trains or evaluates against the worst allowed move.
Worked reading.
For an perturbation set, PGD repeatedly steps in the gradient-sign direction and projects back into the allowed box.
Three examples of threat models:
- Image perturbations bounded by a norm.
- Prompt transformations allowed by a jailbreak policy.
- Retrieval poisoning constrained by an index-insertion budget.
Two non-examples clarify the boundary:
- Any attack the modeler can imagine but has not formalized.
- Random corruption treated as adaptive attack.
Proof or verification habit for threat models:
The nested objective is proved meaningful only after the feasible attack set is stated. The inner maximum is over that set, not over all possible bad events.
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, threat models is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
Adversarial training improves robustness to the modeled threat, not to every strategic behavior.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using threat models responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: Write the set before writing the max.
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Threat models gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |
1.3 strategic adaptation
Strategic adaptation belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is attacker-defender games, threat sets, robust optimization, Stackelberg security games, adversarial examples, and adaptive evaluation. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for strategic adaptation. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
Strategic interaction begins when another decision maker can react to the policy being studied. Stability means the modeled behavior remains defensible after that reaction is allowed.
Worked reading.
Start with one proposed joint behavior. Freeze everyone except one player, compute that player's best alternative, then repeat for every player. If a profitable switch exists, the behavior is not strategically stable.
Three examples of strategic adaptation:
- A guardrail remains effective after attackers see examples of blocked prompts.
- A model-routing policy remains attractive after providers update prices.
- A self-play policy cannot be easily exploited by a newly trained opponent.
Two non-examples clarify the boundary:
- A high average score on a fixed dataset.
- A local minimum of one model's loss with no opponent.
Proof or verification habit for strategic adaptation:
The verification habit is adversarial: search for profitable deviations rather than only confirming the proposed behavior works in the original scenario.
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, strategic adaptation is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
This is the mathematical shift from offline ML to deployed AI systems where users, competitors, and automated attacks learn from the model.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using strategic adaptation responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: State the adaptation channel: what can the other side observe, change, and optimize?
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Strategic adaptation gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |
1.4 robust vs average-case performance
Robust vs average-case performance belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is attacker-defender games, threat sets, robust optimization, Stackelberg security games, adversarial examples, and adaptive evaluation. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for robust vs average-case performance. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
A threat model defines the attacker's allowed moves. Robust optimization then trains or evaluates against the worst allowed move.
Worked reading.
For an perturbation set, PGD repeatedly steps in the gradient-sign direction and projects back into the allowed box.
Three examples of robust vs average-case performance:
- Image perturbations bounded by a norm.
- Prompt transformations allowed by a jailbreak policy.
- Retrieval poisoning constrained by an index-insertion budget.
Two non-examples clarify the boundary:
- Any attack the modeler can imagine but has not formalized.
- Random corruption treated as adaptive attack.
Proof or verification habit for robust vs average-case performance:
The nested objective is proved meaningful only after the feasible attack set is stated. The inner maximum is over that set, not over all possible bad events.
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, robust vs average-case performance is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
Adversarial training improves robustness to the modeled threat, not to every strategic behavior.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using robust vs average-case performance responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: Write the set before writing the max.
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Robust vs average-case performance gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |
1.5 adversarial AI as game design
Adversarial ai as game design belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.
For this subsection, the working scope is attacker-defender games, threat sets, robust optimization, Stackelberg security games, adversarial examples, and adaptive evaluation. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.
The formula gives the mathematical handle for adversarial ai as game design. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.
| Game object | Meaning | AI interpretation |
|---|---|---|
| Player | Decision maker with an objective | Model, user, attacker, defender, generator, evaluator, tool-using agent |
| Action | Choice available to a player | Prompt, route, attack, defense, bid, policy update, generated sample |
| Strategy | Rule or distribution over actions | Stochastic policy, decoding policy, defense randomization, routing policy |
| Payoff | Utility or negative loss | Accuracy, reward, cost, safety score, exploitability, compute budget |
| Equilibrium | Stable joint behavior | No agent can improve by changing alone under the stated game |
Operational definition.
Strategic interaction begins when another decision maker can react to the policy being studied. Stability means the modeled behavior remains defensible after that reaction is allowed.
Worked reading.
Start with one proposed joint behavior. Freeze everyone except one player, compute that player's best alternative, then repeat for every player. If a profitable switch exists, the behavior is not strategically stable.
Three examples of adversarial ai as game design:
- A guardrail remains effective after attackers see examples of blocked prompts.
- A model-routing policy remains attractive after providers update prices.
- A self-play policy cannot be easily exploited by a newly trained opponent.
Two non-examples clarify the boundary:
- A high average score on a fixed dataset.
- A local minimum of one model's loss with no opponent.
Proof or verification habit for adversarial ai as game design:
The verification habit is adversarial: search for profitable deviations rather than only confirming the proposed behavior works in the original scenario.
single-agent optimization: choose theta to minimize L(theta)
game-theoretic optimization: choose pi_i while others choose pi_-i
adversarial objective: choose defense against best attack
multi-agent learning: policies change the environment itself
In AI systems, adversarial ai as game design is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.
This is the mathematical shift from offline ML to deployed AI systems where users, competitors, and automated attacks learn from the model.
Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.
Checklist for using adversarial ai as game design responsibly:
- State the players and their objectives.
- State the action spaces and information structure.
- Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
- Identify pure, mixed, or policy strategies.
- Compute best responses or exploitability before claiming stability.
- Separate equilibrium analysis from welfare analysis.
- Explain what changes if opponents adapt.
Local diagnostic: State the adaptation channel: what can the other side observe, change, and optimize?
This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.
Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Adversarial ai as game design gives the language to reason about that pressure.
A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.
| Diagnostic question | Game-theoretic discipline it tests |
|---|---|
| Who can respond? | Player modeling |
| What can they change? | Action space |
| What do they want? | Payoff design |
| Can one side commit first? | Stackelberg structure |
| Is the worst case important? | Minimax or robust objective |