Part 2

30 min read12 headingsSplit lesson page

Lesson overview | Previous part | Next part

Adversarial Game Theory: Part 2: Formal Definitions to 3. Adversarial Examples and Robust Optimization

2. Formal Definitions

Formal Definitions develops the part of adversarial game theory specified by the approved Chapter 23 table of contents. The treatment is game-theoretic, not merely an optimization recipe.

2.1 attacker action $a_A$

Attacker action $a_a$ belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.

For this subsection, the working scope is attacker-defender games, threat sets, robust optimization, Stackelberg security games, adversarial examples, and adaptive evaluation. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.

R_{\mathrm{rob}}(\theta)=\mathbb{E}_{(\mathbf{x},y)}\left[\max_{\boldsymbol{\delta}\in\mathcal{S}}\mathcal{L}(f_\theta(\mathbf{x}+\boldsymbol{\delta}),y)\right].

The formula gives the mathematical handle for attacker action $a_a$ . In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

Players, actions, and payoffs define the interface of a game. If any one of them is vague, the equilibrium claim is usually vague too.

Worked reading.

A payoff matrix is a compact table: rows are one player's actions, columns are another player's actions, and entries are utilities or losses induced by the joint action.

Three examples of attacker action $a_a$ :

A row action chooses a defense, while a column action chooses an attack family.
An agent set lists every model or tool-using process that can affect reward.
A utility function converts accuracy, safety, latency, and cost into strategic incentives.

Two non-examples clarify the boundary:

A metric with no actor who optimizes it.
An action that is impossible in deployment but included for convenience.

Proof or verification habit for attacker action $a_a$ :

Before proving anything, audit the model specification: every allowed action must map to a payoff for every player.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, attacker action $a_a$ is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Payoff design is AI system design. The game will faithfully optimize the incentives it is given, including bad incentives.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using attacker action $a_a$ responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: Can you name each player, enumerate or parameterize its actions, and compute its payoff from a joint action?

This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Attacker action $a_a$ gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

2.2 defender action $a_D$

Defender action $a_d$ belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.

a_A^*(a_D)\in\arg\max_{a_A}u_A(a_A,a_D).

The formula gives the mathematical handle for defender action $a_d$ . In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

Players, actions, and payoffs define the interface of a game. If any one of them is vague, the equilibrium claim is usually vague too.

Worked reading.

A payoff matrix is a compact table: rows are one player's actions, columns are another player's actions, and entries are utilities or losses induced by the joint action.

Three examples of defender action $a_d$ :

A row action chooses a defense, while a column action chooses an attack family.
An agent set lists every model or tool-using process that can affect reward.
A utility function converts accuracy, safety, latency, and cost into strategic incentives.

Two non-examples clarify the boundary:

A metric with no actor who optimizes it.
An action that is impossible in deployment but included for convenience.

Proof or verification habit for defender action $a_d$ :

Before proving anything, audit the model specification: every allowed action must map to a payoff for every player.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, defender action $a_d$ is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Payoff design is AI system design. The game will faithfully optimize the incentives it is given, including bad incentives.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using defender action $a_d$ responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: Can you name each player, enumerate or parameterize its actions, and compute its payoff from a joint action?

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Defender action $a_d$ gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

2.3 utility and loss functions

Utility and loss functions belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.

\min_G\max_D \mathbb{E}_{\mathbf{x}\sim p_{\mathrm{data}}}\log D(\mathbf{x})+\mathbb{E}_{\mathbf{z}\sim p_{\mathbf{z}}}\log(1-D(G(\mathbf{z}))).

The formula gives the mathematical handle for utility and loss functions. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

Players, actions, and payoffs define the interface of a game. If any one of them is vague, the equilibrium claim is usually vague too.

Worked reading.

A payoff matrix is a compact table: rows are one player's actions, columns are another player's actions, and entries are utilities or losses induced by the joint action.

Three examples of utility and loss functions:

A row action chooses a defense, while a column action chooses an attack family.
An agent set lists every model or tool-using process that can affect reward.
A utility function converts accuracy, safety, latency, and cost into strategic incentives.

Two non-examples clarify the boundary:

A metric with no actor who optimizes it.
An action that is impossible in deployment but included for convenience.

Proof or verification habit for utility and loss functions:

Before proving anything, audit the model specification: every allowed action must map to a payoff for every player.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, utility and loss functions is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Payoff design is AI system design. The game will faithfully optimize the incentives it is given, including bad incentives.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using utility and loss functions responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: Can you name each player, enumerate or parameterize its actions, and compute its payoff from a joint action?

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Utility and loss functions gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

2.4 threat set $\mathcal{S}$

Threat set $\mathcal{s}$ belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.

a_A\in A_A,\qquad a_D\in A_D.

The formula gives the mathematical handle for threat set $\mathcal{s}$ . In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

A threat model defines the attacker's allowed moves. Robust optimization then trains or evaluates against the worst allowed move.

Worked reading.

For an $\ell_\infty$ perturbation set, PGD repeatedly steps in the gradient-sign direction and projects back into the allowed box.

Three examples of threat set $\mathcal{s}$ :

Image perturbations bounded by a norm.
Prompt transformations allowed by a jailbreak policy.
Retrieval poisoning constrained by an index-insertion budget.

Two non-examples clarify the boundary:

Any attack the modeler can imagine but has not formalized.
Random corruption treated as adaptive attack.

Proof or verification habit for threat set $\mathcal{s}$ :

The nested objective is proved meaningful only after the feasible attack set is stated. The inner maximum is over that set, not over all possible bad events.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, threat set $\mathcal{s}$ is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Adversarial training improves robustness to the modeled threat, not to every strategic behavior.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using threat set $\mathcal{s}$ responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: Write the set $\mathcal{S}$ before writing the max.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Threat set $\mathcal{s}$ gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

2.5 security game and Stackelberg preview

Security game and stackelberg preview belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.

R_{\mathrm{rob}}(\theta)=\mathbb{E}_{(\mathbf{x},y)}\left[\max_{\boldsymbol{\delta}\in\mathcal{S}}\mathcal{L}(f_\theta(\mathbf{x}+\boldsymbol{\delta}),y)\right].

The formula gives the mathematical handle for security game and stackelberg preview. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

Security games often have timing: a defender commits to a randomized allocation, then an attacker chooses a best response.

Worked reading.

A defender with two monitors and three targets chooses coverage probabilities; the attacker chooses the target with highest expected utility after observing the commitment rule.

Three examples of security game and stackelberg preview:

Random audits over model outputs.
Rate-limit allocation over API endpoints.
Canary documents placed to detect extraction.

Two non-examples clarify the boundary:

A simultaneous zero-sum matrix game with no commitment.
A fixed checklist that attackers cannot observe or learn from.

Proof or verification habit for security game and stackelberg preview:

Stackelberg analysis proves optimal commitment by solving the follower's best-response constraints inside the leader's optimization.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, security game and stackelberg preview is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

For AI security, commitment and observability matter because attackers often adapt after seeing public defenses.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using security game and stackelberg preview responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: State what the attacker knows about the defense.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Security game and stackelberg preview gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

3. Adversarial Examples and Robust Optimization

Adversarial Examples and Robust Optimization develops the part of adversarial game theory specified by the approved Chapter 23 table of contents. The treatment is game-theoretic, not merely an optimization recipe.

3.1 perturbation sets

Perturbation sets belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.

a_A^*(a_D)\in\arg\max_{a_A}u_A(a_A,a_D).

The formula gives the mathematical handle for perturbation sets. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

A threat model defines the attacker's allowed moves. Robust optimization then trains or evaluates against the worst allowed move.

Worked reading.

For an $\ell_\infty$ perturbation set, PGD repeatedly steps in the gradient-sign direction and projects back into the allowed box.

Three examples of perturbation sets:

Image perturbations bounded by a norm.
Prompt transformations allowed by a jailbreak policy.
Retrieval poisoning constrained by an index-insertion budget.

Two non-examples clarify the boundary:

Any attack the modeler can imagine but has not formalized.
Random corruption treated as adaptive attack.

Proof or verification habit for perturbation sets:

The nested objective is proved meaningful only after the feasible attack set is stated. The inner maximum is over that set, not over all possible bad events.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, perturbation sets is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Adversarial training improves robustness to the modeled threat, not to every strategic behavior.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using perturbation sets responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: Write the set $\mathcal{S}$ before writing the max.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Perturbation sets gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

3.2 inner maximization

Inner maximization belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.

\min_G\max_D \mathbb{E}_{\mathbf{x}\sim p_{\mathrm{data}}}\log D(\mathbf{x})+\mathbb{E}_{\mathbf{z}\sim p_{\mathbf{z}}}\log(1-D(G(\mathbf{z}))).

The formula gives the mathematical handle for inner maximization. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

A threat model defines the attacker's allowed moves. Robust optimization then trains or evaluates against the worst allowed move.

Worked reading.

For an $\ell_\infty$ perturbation set, PGD repeatedly steps in the gradient-sign direction and projects back into the allowed box.

Three examples of inner maximization:

Image perturbations bounded by a norm.
Prompt transformations allowed by a jailbreak policy.
Retrieval poisoning constrained by an index-insertion budget.

Two non-examples clarify the boundary:

Any attack the modeler can imagine but has not formalized.
Random corruption treated as adaptive attack.

Proof or verification habit for inner maximization:

The nested objective is proved meaningful only after the feasible attack set is stated. The inner maximum is over that set, not over all possible bad events.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, inner maximization is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Adversarial training improves robustness to the modeled threat, not to every strategic behavior.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using inner maximization responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: Write the set $\mathcal{S}$ before writing the max.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Inner maximization gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

3.3 outer minimization

Outer minimization belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.

a_A\in A_A,\qquad a_D\in A_D.

The formula gives the mathematical handle for outer minimization. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

A threat model defines the attacker's allowed moves. Robust optimization then trains or evaluates against the worst allowed move.

Worked reading.

For an $\ell_\infty$ perturbation set, PGD repeatedly steps in the gradient-sign direction and projects back into the allowed box.

Three examples of outer minimization:

Image perturbations bounded by a norm.
Prompt transformations allowed by a jailbreak policy.
Retrieval poisoning constrained by an index-insertion budget.

Two non-examples clarify the boundary:

Any attack the modeler can imagine but has not formalized.
Random corruption treated as adaptive attack.

Proof or verification habit for outer minimization:

The nested objective is proved meaningful only after the feasible attack set is stated. The inner maximum is over that set, not over all possible bad events.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, outer minimization is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Adversarial training improves robustness to the modeled threat, not to every strategic behavior.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using outer minimization responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: Write the set $\mathcal{S}$ before writing the max.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Outer minimization gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

3.4 PGD attack preview

Pgd attack preview belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.

R_{\mathrm{rob}}(\theta)=\mathbb{E}_{(\mathbf{x},y)}\left[\max_{\boldsymbol{\delta}\in\mathcal{S}}\mathcal{L}(f_\theta(\mathbf{x}+\boldsymbol{\delta}),y)\right].

The formula gives the mathematical handle for pgd attack preview. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

A threat model defines the attacker's allowed moves. Robust optimization then trains or evaluates against the worst allowed move.

Worked reading.

For an $\ell_\infty$ perturbation set, PGD repeatedly steps in the gradient-sign direction and projects back into the allowed box.

Three examples of pgd attack preview:

Image perturbations bounded by a norm.
Prompt transformations allowed by a jailbreak policy.
Retrieval poisoning constrained by an index-insertion budget.

Two non-examples clarify the boundary:

Any attack the modeler can imagine but has not formalized.
Random corruption treated as adaptive attack.

Proof or verification habit for pgd attack preview:

The nested objective is proved meaningful only after the feasible attack set is stated. The inner maximum is over that set, not over all possible bad events.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, pgd attack preview is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Adversarial training improves robustness to the modeled threat, not to every strategic behavior.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using pgd attack preview responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: Write the set $\mathcal{S}$ before writing the max.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Pgd attack preview gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

3.5 robust risk

Robust risk belongs to the canonical scope of Adversarial Game Theory. The central object is not a single optimizer but a system of decision makers whose objectives interact.

a_A^*(a_D)\in\arg\max_{a_A}u_A(a_A,a_D).

The formula gives the mathematical handle for robust risk. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

A threat model defines the attacker's allowed moves. Robust optimization then trains or evaluates against the worst allowed move.

Worked reading.

For an $\ell_\infty$ perturbation set, PGD repeatedly steps in the gradient-sign direction and projects back into the allowed box.

Three examples of robust risk:

Image perturbations bounded by a norm.
Prompt transformations allowed by a jailbreak policy.
Retrieval poisoning constrained by an index-insertion budget.

Two non-examples clarify the boundary:

Any attack the modeler can imagine but has not formalized.
Random corruption treated as adaptive attack.

Proof or verification habit for robust risk:

The nested objective is proved meaningful only after the feasible attack set is stated. The inner maximum is over that set, not over all possible bad events.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, robust risk is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Adversarial training improves robustness to the modeled threat, not to every strategic behavior.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using robust risk responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: Write the set $\mathcal{S}$ before writing the max.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Robust risk gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

Adversarial Game Theory: Part 2 - Formal Definitions To 3 Adversarial Examples And Robust Optimization

Adversarial Game Theory: Part 2: Formal Definitions to 3. Adversarial Examples and Robust Optimization

2. Formal Definitions

2.1 attacker action $a_A$

2.2 defender action $a_D$

2.3 utility and loss functions

2.4 threat set $\mathcal{S}$

2.5 security game and Stackelberg preview

3. Adversarial Examples and Robust Optimization

3.1 perturbation sets

3.2 inner maximization

3.3 outer minimization

3.4 PGD attack preview

3.5 robust risk

Test this lesson

Which module does this lesson belong to?

Which section is covered in this lesson content?

Which term is most central to this lesson?

What is the best way to use this lesson for real learning?

Adversarial Game Theory: Part 2 - Formal Definitions To 3 Adversarial Examples And Robust Optimization

Adversarial Game Theory: Part 2: Formal Definitions to 3. Adversarial Examples and Robust Optimization

2. Formal Definitions

2.1 attacker action aAa_AaA​

2.2 defender action aDa_DaD​

2.3 utility and loss functions

2.4 threat set S\mathcal{S}S

2.5 security game and Stackelberg preview

3. Adversarial Examples and Robust Optimization

3.1 perturbation sets

3.2 inner maximization

3.3 outer minimization

3.4 PGD attack preview

3.5 robust risk

Test this lesson

Which module does this lesson belong to?

Which section is covered in this lesson content?

Which term is most central to this lesson?

What is the best way to use this lesson for real learning?

2.1 attacker action $a_A$

2.2 defender action $a_D$

2.4 threat set $\mathcal{S}$