Part 2

16 min read6 headingsSplit lesson page

Lesson overview | Previous part | Next part

Nash Equilibria: Part 2: Formal Definitions

2. Formal Definitions

Formal Definitions develops the part of nash equilibria specified by the approved Chapter 23 table of contents. The treatment is game-theoretic, not merely an optimization recipe.

2.1 normal-form game

Normal-form game belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.

For this subsection, the working scope is normal-form games, pure and mixed strategies, best responses, Nash equilibria, existence, computation, and AI equilibrium failures. We use players, action sets, strategies, payoffs, and response rules. The key question is whether a proposed behavior is stable when another agent adapts.

u_i(a_i,a_{-i}) \ge u_i(a_i',a_{-i}) \quad \forall a_i'\in A_i.

The formula gives the mathematical handle for normal-form game. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

A normal-form game freezes timing and information into one payoff table. The table is useful when each player chooses an action once, or when a larger system can be locally summarized by a simultaneous strategic choice.

Worked reading.

For two players with actions $A_1=\{U,D\}$ and $A_2=\{L,R\}$ , a payoff table assigns a pair $(u_1,u_2)$ to each of the four cells. A candidate outcome is one cell; a candidate mixed profile is a distribution over rows and columns.

Three examples of normal-form game:

A model router and a model provider choose prices simultaneously.
A generator and discriminator pick local update directions in one training round.
A defender and attacker pick a guardrail and a prompt class before observing the exact prompt.

Two non-examples clarify the boundary:

A single gradient step with no opponent.
A sequential decision process where the second mover observes the first action.

Proof or verification habit for normal-form game:

The proof habit is bookkeeping: verify the tuple contains all players, action sets, and payoff maps before asking about equilibrium.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, normal-form game is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Normal-form abstraction is the small table behind larger AI systems. It lets us ask which local incentives a training loop or deployment mechanism creates.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using normal-form game responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: If timing, hidden information, or state transitions matter, normal form is a model reduction rather than the full game.

This chapter follows Chapter 22 by adding strategic adaptation. Causal inference asks what happens when we intervene. Game theory asks what happens when other decision makers anticipate or respond to that intervention.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Normal-form game gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

2.2 players actions and payoffs

Players actions and payoffs belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.

\boldsymbol{\pi}_i\in\Delta(A_i),\qquad \sum_{a_i\in A_i}\pi_i(a_i)=1.

The formula gives the mathematical handle for players actions and payoffs. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

Players, actions, and payoffs define the interface of a game. If any one of them is vague, the equilibrium claim is usually vague too.

Worked reading.

A payoff matrix is a compact table: rows are one player's actions, columns are another player's actions, and entries are utilities or losses induced by the joint action.

Three examples of players actions and payoffs:

A row action chooses a defense, while a column action chooses an attack family.
An agent set lists every model or tool-using process that can affect reward.
A utility function converts accuracy, safety, latency, and cost into strategic incentives.

Two non-examples clarify the boundary:

A metric with no actor who optimizes it.
An action that is impossible in deployment but included for convenience.

Proof or verification habit for players actions and payoffs:

Before proving anything, audit the model specification: every allowed action must map to a payoff for every player.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, players actions and payoffs is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Payoff design is AI system design. The game will faithfully optimize the incentives it is given, including bad incentives.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using players actions and payoffs responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: Can you name each player, enumerate or parameterize its actions, and compute its payoff from a joint action?

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Players actions and payoffs gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

2.3 pure strategy

Pure strategy belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.

u_i(\boldsymbol{\pi}_i^*,\boldsymbol{\pi}_{-i}^*) \ge u_i(\boldsymbol{\pi}_i,\boldsymbol{\pi}_{-i}^*) \quad \forall \boldsymbol{\pi}_i.

The formula gives the mathematical handle for pure strategy. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

A pure strategy chooses one action deterministically. Best-response tables mark which pure actions are optimal against each opponent action.

Worked reading.

For every column, highlight the row entries with maximal row payoff; for every row, highlight the column entries with maximal column payoff. A cell highlighted for both players is a pure Nash equilibrium.

Three examples of pure strategy:

A deterministic guardrail mode.
A fixed model route.
A single action chosen by each player in a coordination game.

Two non-examples clarify the boundary:

A probability distribution over actions.
A randomized audit policy.

Proof or verification habit for pure strategy:

The proof is finite enumeration: compare every row within a column and every column within a row.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, pure strategy is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Pure-strategy analysis is the fastest sanity check before moving to mixed strategies or dynamic learning.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using pure strategy responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: If no cell is jointly best-response highlighted, search for mixed equilibria instead of forcing a pure one.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Pure strategy gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

2.4 mixed strategy $\boldsymbol{\pi}_i$

Mixed strategy $\boldsymbol{\pi}_i$ belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.

G=(N,(A_i)_{i\in N},(u_i)_{i\in N}).

The formula gives the mathematical handle for mixed strategy $\boldsymbol{\pi}_i$ . In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

A mixed strategy is a probability distribution over actions. In equilibrium, actions used with positive probability must usually give the same expected payoff; otherwise probability can move to the better action.

Worked reading.

In matching pennies, the row player is indifferent only when the column player randomizes heads and tails equally. The same calculation makes the column player indifferent, giving the $1/2,1/2$ equilibrium.

Three examples of mixed strategy $\boldsymbol{\pi}_i$ :

Randomized audits that make attackers uncertain.
Stochastic decoding policies that prevent deterministic exploitation.
Exploration policies in self-play where pure repetition would be exploited.

Two non-examples clarify the boundary:

Adding noise after choosing a deterministic losing action.
A distribution that assigns probability to an action with strictly lower payoff while another supported action is better.

Proof or verification habit for mixed strategy $\boldsymbol{\pi}_i$ :

Set expected payoffs of supported actions equal, solve for probabilities, then verify unsupported actions are not better.

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, mixed strategy $\boldsymbol{\pi}_i$ is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

Mixed strategies explain why robust systems often randomize: predictability can be a vulnerability when opponents adapt.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using mixed strategy $\boldsymbol{\pi}_i$ responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: Check both support equality and off-support inequalities.

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Mixed strategy $\boldsymbol{\pi}_i$ gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

2.5 Nash equilibrium

Nash equilibrium belongs to the canonical scope of Nash Equilibria. The central object is not a single optimizer but a system of decision makers whose objectives interact.

u_i(a_i,a_{-i}) \ge u_i(a_i',a_{-i}) \quad \forall a_i'\in A_i.

The formula gives the mathematical handle for nash equilibrium. In game theory, this expression should always be read with the opponent's decision rule in mind. A policy that is optimal in isolation may be exploitable once another player observes and responds to it.

Game object	Meaning	AI interpretation
Player	Decision maker with an objective	Model, user, attacker, defender, generator, evaluator, tool-using agent
Action	Choice available to a player	Prompt, route, attack, defense, bid, policy update, generated sample
Strategy	Rule or distribution over actions	Stochastic policy, decoding policy, defense randomization, routing policy
Payoff	Utility or negative loss	Accuracy, reward, cost, safety score, exploitability, compute budget
Equilibrium	Stable joint behavior	No agent can improve by changing alone under the stated game

Operational definition.

A Nash equilibrium is a profile of strategies where no player can improve by changing its own strategy while all other strategies remain fixed.

Worked reading.

In the prisoner's dilemma payoff convention, mutual defection can be a Nash equilibrium even when mutual cooperation is better for both players. This is the central warning: stability and desirability are different properties.

Three examples of nash equilibrium:

A self-play policy pair where neither side has a profitable unilateral exploit.
A GAN fixed point where the generator distribution matches data and the discriminator cannot improve classification.
A routing market where no model provider benefits from changing only its bid.

Two non-examples clarify the boundary:

A high-welfare outcome with a profitable unilateral deviation.
A training checkpoint with low loss but a large best-response exploit.

Proof or verification habit for nash equilibrium:

The proof is a universal deviation check: for each player $i$ , hold $\pi_{-i}$ fixed and show $u_i(\pi_i^*,\pi_{-i}^*)\ge u_i(\pi_i,\pi_{-i}^*)$ for all allowed $\pi_i$ .

single-agent optimization:    choose theta to minimize L(theta)
game-theoretic optimization:  choose pi_i while others choose pi_-i
adversarial objective:        choose defense against best attack
multi-agent learning:         policies change the environment itself

In AI systems, nash equilibrium is useful because modern models are deployed into adaptive environments: users learn prompt tricks, attackers search for failures, evaluators change rubrics, and other agents compete for resources.

For AI agents, Nash is a stability diagnostic. It does not guarantee safety, alignment, fairness, or global efficiency unless those objectives are encoded in the game.

Notebook implementation will use small synthetic payoff matrices and learning dynamics. This keeps the mathematics executable while avoiding external datasets or heavyweight game solvers.

Checklist for using nash equilibrium responsibly:

State the players and their objectives.
State the action spaces and information structure.
Decide whether the game is zero-sum, general-sum, cooperative, or adversarial.
Identify pure, mixed, or policy strategies.
Compute best responses or exploitability before claiming stability.
Separate equilibrium analysis from welfare analysis.
Explain what changes if opponents adapt.

Local diagnostic: Ask: if one deployed model, user, or attacker changed behavior alone, would it gain?

Modern AI makes the distinction practical. A deployed model can be optimized against by users, attackers, competitors, automated evaluators, and other models. Nash equilibrium gives the language to reason about that pressure.

A final diagnostic question is whether a decision remains good after another agent learns from it. If not, the analysis needs game theory, not just prediction, causality, or optimization.

Diagnostic question	Game-theoretic discipline it tests
Who can respond?	Player modeling
What can they change?	Action space
What do they want?	Payoff design
Can one side commit first?	Stackelberg structure
Is the worst case important?	Minimax or robust objective

Nash Equilibria: Part 2 - Formal Definitions

Nash Equilibria: Part 2: Formal Definitions

2. Formal Definitions

2.1 normal-form game

2.2 players actions and payoffs

2.3 pure strategy

2.4 mixed strategy $\boldsymbol{\pi}_i$

2.5 Nash equilibrium

Test this lesson

Which module does this lesson belong to?

Which section is covered in this lesson content?

Which term is most central to this lesson?

What is the best way to use this lesson for real learning?

Nash Equilibria: Part 2 - Formal Definitions

Nash Equilibria: Part 2: Formal Definitions

2. Formal Definitions

2.1 normal-form game

2.2 players actions and payoffs

2.3 pure strategy

2.4 mixed strategy πi\boldsymbol{\pi}_iπi​

2.5 Nash equilibrium

Test this lesson

Which module does this lesson belong to?

Which section is covered in this lesson content?

Which term is most central to this lesson?

What is the best way to use this lesson for real learning?

2.4 mixed strategy $\boldsymbol{\pi}_i$