Private notes
0/8000

Notes stay private to your browser until account sync is configured.

Part 4
17 min read6 headingsSplit lesson page

Lesson overview | Previous part | Next part

Do Calculus: Part 4: The Three Rules

4. The Three Rules

The Three Rules develops the part of do calculus specified by the approved Chapter 22 table of contents. The treatment is causal, not merely predictive: the central objects are mechanisms, interventions, assumptions, and counterfactuals.

4.1 insertion and deletion of observations

Insertion and deletion of observations belongs to the canonical scope of Do Calculus. The central move in causal inference is to distinguish a statistical relation from a claim about what would happen under an intervention.

For this subsection, the working scope is do-operator semantics, mutilated graphs, backdoor and frontdoor criteria, identification rules, and post-identification estimation. The mathematical objects are variables, mechanisms, graphs, interventions, and assumptions. A causal claim is incomplete until all five are visible.

ATE=E[Ydo(X=1)]E[Ydo(X=0)].\operatorname{ATE}=\mathbb{E}[Y \mid \operatorname{do}(X=1)]-\mathbb{E}[Y \mid \operatorname{do}(X=0)].

The formula gives a compact handle on insertion and deletion of observations. It should not be read as a purely algebraic identity. In causal inference, equations encode assumptions about mechanisms, missing variables, and which parts of the world remain stable under intervention.

Causal objectMeaningAI interpretation
VariableQuantity in the causal systemPrompt feature, user action, treatment, tool call, exposure, label, reward
MechanismAssignment that generates a variableData pipeline, recommender policy, human behavior, model routing rule
GraphQualitative causal assumptionsWhat can affect what, and which paths may confound effects
InterventionReplacement of a mechanismA/B rollout, policy switch, prompt template change, retrieval update
CounterfactualUnit-level alternate worldWhat this user or model trace would have done under another action

Three examples of insertion and deletion of observations:

  1. A recommender team wants the causal effect of ranking a document higher, not merely the correlation between rank and clicks.
  2. An LLM platform changes a safety policy and wants to estimate whether refusals changed because of the policy or because user prompts shifted.
  3. A fairness auditor asks whether a proxy feature transmits an impermissible causal path into a model decision.

Two non-examples expose the boundary:

  1. A high predictive coefficient is not a causal effect unless the graph and intervention assumptions justify it.
  2. A plausible narrative produced by a language model is not a counterfactual unless it is grounded in a causal model.

The proof habit for insertion and deletion of observations is to name the graph operation. Conditioning restricts a distribution. Intervention replaces a mechanism. Counterfactual reasoning updates exogenous uncertainty from evidence, changes a mechanism, then predicts.

observed association:      P(Y | X=x)
intervention question:     P(Y | do(X=x))
counterfactual question:   P(Y_x | E=e)
discovery question:        which G could have generated P(V)?

In machine learning, insertion and deletion of observations is valuable because models are often deployed under interventions: ranking changes, policy changes, safety filters, tool-use gates, data collection changes, and human feedback loops. Prediction alone does not tell us which change caused which downstream behavior.

Notebook implementation will use synthetic SCMs and small graphs. This keeps the examples executable while preserving the conceptual split between identification and estimation.

Checklist for using insertion and deletion of observations responsibly:

  • State the causal question before choosing a method.
  • Draw or describe the assumed causal graph.
  • Mark observed, latent, treatment, outcome, and adjustment variables.
  • Separate intervention notation from conditioning notation.
  • Decide whether the query is identifiable before estimating it.
  • Report assumptions that cannot be tested from the observed data alone.
  • Use ML as an estimation aid, not as a substitute for causal design.

This chapter follows the boundary set by Chapter 21. Statistical learning theory controls prediction error under distributional assumptions. Causal inference asks what happens when the distribution changes because something is done.

Modern AI systems make this distinction unavoidable. A foundation model can predict which action historically followed a context, but a decision system needs to know what would happen if it took a different action in that context.

Thus, insertion and deletion of observations is not an abstract philosophical add-on. It is a production and research tool for deciding which model, prompt, policy, feature, or intervention actually changed an outcome.

A final diagnostic question is whether the claim would survive a policy change. If the answer depends only on a historical correlation, it belongs in predictive modeling. If the answer depends on what mechanism is replaced and which paths remain active, it belongs in causal inference.

Diagnostic questionCausal discipline it tests
What is being changed?Intervention target
Which mechanism is replaced?SCM modularity
Which paths transmit the effect?Graph semantics
Which variables are merely observed?Conditioning versus intervention
Which quantities are unobserved?Confounding and counterfactual uncertainty

4.2 action observation exchange

Action observation exchange belongs to the canonical scope of Do Calculus. The central move in causal inference is to distinguish a statistical relation from a claim about what would happen under an intervention.

For this subsection, the working scope is do-operator semantics, mutilated graphs, backdoor and frontdoor criteria, identification rules, and post-identification estimation. The mathematical objects are variables, mechanisms, graphs, interventions, and assumptions. A causal claim is incomplete until all five are visible.

P(Ydo(X=x))P(YX=x)in general.P(Y \mid \operatorname{do}(X=x)) \ne P(Y \mid X=x) \quad \text{in general}.

The formula gives a compact handle on action observation exchange. It should not be read as a purely algebraic identity. In causal inference, equations encode assumptions about mechanisms, missing variables, and which parts of the world remain stable under intervention.

Causal objectMeaningAI interpretation
VariableQuantity in the causal systemPrompt feature, user action, treatment, tool call, exposure, label, reward
MechanismAssignment that generates a variableData pipeline, recommender policy, human behavior, model routing rule
GraphQualitative causal assumptionsWhat can affect what, and which paths may confound effects
InterventionReplacement of a mechanismA/B rollout, policy switch, prompt template change, retrieval update
CounterfactualUnit-level alternate worldWhat this user or model trace would have done under another action

Three examples of action observation exchange:

  1. A recommender team wants the causal effect of ranking a document higher, not merely the correlation between rank and clicks.
  2. An LLM platform changes a safety policy and wants to estimate whether refusals changed because of the policy or because user prompts shifted.
  3. A fairness auditor asks whether a proxy feature transmits an impermissible causal path into a model decision.

Two non-examples expose the boundary:

  1. A high predictive coefficient is not a causal effect unless the graph and intervention assumptions justify it.
  2. A plausible narrative produced by a language model is not a counterfactual unless it is grounded in a causal model.

The proof habit for action observation exchange is to name the graph operation. Conditioning restricts a distribution. Intervention replaces a mechanism. Counterfactual reasoning updates exogenous uncertainty from evidence, changes a mechanism, then predicts.

observed association:      P(Y | X=x)
intervention question:     P(Y | do(X=x))
counterfactual question:   P(Y_x | E=e)
discovery question:        which G could have generated P(V)?

In machine learning, action observation exchange is valuable because models are often deployed under interventions: ranking changes, policy changes, safety filters, tool-use gates, data collection changes, and human feedback loops. Prediction alone does not tell us which change caused which downstream behavior.

Notebook implementation will use synthetic SCMs and small graphs. This keeps the examples executable while preserving the conceptual split between identification and estimation.

Checklist for using action observation exchange responsibly:

  • State the causal question before choosing a method.
  • Draw or describe the assumed causal graph.
  • Mark observed, latent, treatment, outcome, and adjustment variables.
  • Separate intervention notation from conditioning notation.
  • Decide whether the query is identifiable before estimating it.
  • Report assumptions that cannot be tested from the observed data alone.
  • Use ML as an estimation aid, not as a substitute for causal design.

This chapter follows the boundary set by Chapter 21. Statistical learning theory controls prediction error under distributional assumptions. Causal inference asks what happens when the distribution changes because something is done.

Modern AI systems make this distinction unavoidable. A foundation model can predict which action historically followed a context, but a decision system needs to know what would happen if it took a different action in that context.

Thus, action observation exchange is not an abstract philosophical add-on. It is a production and research tool for deciding which model, prompt, policy, feature, or intervention actually changed an outcome.

A final diagnostic question is whether the claim would survive a policy change. If the answer depends only on a historical correlation, it belongs in predictive modeling. If the answer depends on what mechanism is replaced and which paths remain active, it belongs in causal inference.

Diagnostic questionCausal discipline it tests
What is being changed?Intervention target
Which mechanism is replaced?SCM modularity
Which paths transmit the effect?Graph semantics
Which variables are merely observed?Conditioning versus intervention
Which quantities are unobserved?Confounding and counterfactual uncertainty

4.3 insertion and deletion of actions

Insertion and deletion of actions belongs to the canonical scope of Do Calculus. The central move in causal inference is to distinguish a statistical relation from a claim about what would happen under an intervention.

For this subsection, the working scope is do-operator semantics, mutilated graphs, backdoor and frontdoor criteria, identification rules, and post-identification estimation. The mathematical objects are variables, mechanisms, graphs, interventions, and assumptions. A causal claim is incomplete until all five are visible.

P(Ydo(X=x))=zP(YX=x,Z=z)P(Z=z).P(Y \mid \operatorname{do}(X=x))=\sum_z P(Y \mid X=x,Z=z)P(Z=z).

The formula gives a compact handle on insertion and deletion of actions. It should not be read as a purely algebraic identity. In causal inference, equations encode assumptions about mechanisms, missing variables, and which parts of the world remain stable under intervention.

Causal objectMeaningAI interpretation
VariableQuantity in the causal systemPrompt feature, user action, treatment, tool call, exposure, label, reward
MechanismAssignment that generates a variableData pipeline, recommender policy, human behavior, model routing rule
GraphQualitative causal assumptionsWhat can affect what, and which paths may confound effects
InterventionReplacement of a mechanismA/B rollout, policy switch, prompt template change, retrieval update
CounterfactualUnit-level alternate worldWhat this user or model trace would have done under another action

Three examples of insertion and deletion of actions:

  1. A recommender team wants the causal effect of ranking a document higher, not merely the correlation between rank and clicks.
  2. An LLM platform changes a safety policy and wants to estimate whether refusals changed because of the policy or because user prompts shifted.
  3. A fairness auditor asks whether a proxy feature transmits an impermissible causal path into a model decision.

Two non-examples expose the boundary:

  1. A high predictive coefficient is not a causal effect unless the graph and intervention assumptions justify it.
  2. A plausible narrative produced by a language model is not a counterfactual unless it is grounded in a causal model.

The proof habit for insertion and deletion of actions is to name the graph operation. Conditioning restricts a distribution. Intervention replaces a mechanism. Counterfactual reasoning updates exogenous uncertainty from evidence, changes a mechanism, then predicts.

observed association:      P(Y | X=x)
intervention question:     P(Y | do(X=x))
counterfactual question:   P(Y_x | E=e)
discovery question:        which G could have generated P(V)?

In machine learning, insertion and deletion of actions is valuable because models are often deployed under interventions: ranking changes, policy changes, safety filters, tool-use gates, data collection changes, and human feedback loops. Prediction alone does not tell us which change caused which downstream behavior.

Notebook implementation will use synthetic SCMs and small graphs. This keeps the examples executable while preserving the conceptual split between identification and estimation.

Checklist for using insertion and deletion of actions responsibly:

  • State the causal question before choosing a method.
  • Draw or describe the assumed causal graph.
  • Mark observed, latent, treatment, outcome, and adjustment variables.
  • Separate intervention notation from conditioning notation.
  • Decide whether the query is identifiable before estimating it.
  • Report assumptions that cannot be tested from the observed data alone.
  • Use ML as an estimation aid, not as a substitute for causal design.

This chapter follows the boundary set by Chapter 21. Statistical learning theory controls prediction error under distributional assumptions. Causal inference asks what happens when the distribution changes because something is done.

Modern AI systems make this distinction unavoidable. A foundation model can predict which action historically followed a context, but a decision system needs to know what would happen if it took a different action in that context.

Thus, insertion and deletion of actions is not an abstract philosophical add-on. It is a production and research tool for deciding which model, prompt, policy, feature, or intervention actually changed an outcome.

A final diagnostic question is whether the claim would survive a policy change. If the answer depends only on a historical correlation, it belongs in predictive modeling. If the answer depends on what mechanism is replaced and which paths remain active, it belongs in causal inference.

Diagnostic questionCausal discipline it tests
What is being changed?Intervention target
Which mechanism is replaced?SCM modularity
Which paths transmit the effect?Graph semantics
Which variables are merely observed?Conditioning versus intervention
Which quantities are unobserved?Confounding and counterfactual uncertainty

4.4 graph conditions for each rule

Graph conditions for each rule belongs to the canonical scope of Do Calculus. The central move in causal inference is to distinguish a statistical relation from a claim about what would happen under an intervention.

For this subsection, the working scope is do-operator semantics, mutilated graphs, backdoor and frontdoor criteria, identification rules, and post-identification estimation. The mathematical objects are variables, mechanisms, graphs, interventions, and assumptions. A causal claim is incomplete until all five are visible.

P(Ydo(X=x))=zP(Z=zX=x)xP(YX=x,Z=z)P(X=x).P(Y \mid \operatorname{do}(X=x))=\sum_z P(Z=z \mid X=x)\sum_{x'}P(Y \mid X=x',Z=z)P(X=x').

The formula gives a compact handle on graph conditions for each rule. It should not be read as a purely algebraic identity. In causal inference, equations encode assumptions about mechanisms, missing variables, and which parts of the world remain stable under intervention.

Causal objectMeaningAI interpretation
VariableQuantity in the causal systemPrompt feature, user action, treatment, tool call, exposure, label, reward
MechanismAssignment that generates a variableData pipeline, recommender policy, human behavior, model routing rule
GraphQualitative causal assumptionsWhat can affect what, and which paths may confound effects
InterventionReplacement of a mechanismA/B rollout, policy switch, prompt template change, retrieval update
CounterfactualUnit-level alternate worldWhat this user or model trace would have done under another action

Three examples of graph conditions for each rule:

  1. A recommender team wants the causal effect of ranking a document higher, not merely the correlation between rank and clicks.
  2. An LLM platform changes a safety policy and wants to estimate whether refusals changed because of the policy or because user prompts shifted.
  3. A fairness auditor asks whether a proxy feature transmits an impermissible causal path into a model decision.

Two non-examples expose the boundary:

  1. A high predictive coefficient is not a causal effect unless the graph and intervention assumptions justify it.
  2. A plausible narrative produced by a language model is not a counterfactual unless it is grounded in a causal model.

The proof habit for graph conditions for each rule is to name the graph operation. Conditioning restricts a distribution. Intervention replaces a mechanism. Counterfactual reasoning updates exogenous uncertainty from evidence, changes a mechanism, then predicts.

observed association:      P(Y | X=x)
intervention question:     P(Y | do(X=x))
counterfactual question:   P(Y_x | E=e)
discovery question:        which G could have generated P(V)?

In machine learning, graph conditions for each rule is valuable because models are often deployed under interventions: ranking changes, policy changes, safety filters, tool-use gates, data collection changes, and human feedback loops. Prediction alone does not tell us which change caused which downstream behavior.

Notebook implementation will use synthetic SCMs and small graphs. This keeps the examples executable while preserving the conceptual split between identification and estimation.

Checklist for using graph conditions for each rule responsibly:

  • State the causal question before choosing a method.
  • Draw or describe the assumed causal graph.
  • Mark observed, latent, treatment, outcome, and adjustment variables.
  • Separate intervention notation from conditioning notation.
  • Decide whether the query is identifiable before estimating it.
  • Report assumptions that cannot be tested from the observed data alone.
  • Use ML as an estimation aid, not as a substitute for causal design.

This chapter follows the boundary set by Chapter 21. Statistical learning theory controls prediction error under distributional assumptions. Causal inference asks what happens when the distribution changes because something is done.

Modern AI systems make this distinction unavoidable. A foundation model can predict which action historically followed a context, but a decision system needs to know what would happen if it took a different action in that context.

Thus, graph conditions for each rule is not an abstract philosophical add-on. It is a production and research tool for deciding which model, prompt, policy, feature, or intervention actually changed an outcome.

A final diagnostic question is whether the claim would survive a policy change. If the answer depends only on a historical correlation, it belongs in predictive modeling. If the answer depends on what mechanism is replaced and which paths remain active, it belongs in causal inference.

Diagnostic questionCausal discipline it tests
What is being changed?Intervention target
Which mechanism is replaced?SCM modularity
Which paths transmit the effect?Graph semantics
Which variables are merely observed?Conditioning versus intervention
Which quantities are unobserved?Confounding and counterfactual uncertainty

4.5 proof intuition from d-separation

Proof intuition from d-separation belongs to the canonical scope of Do Calculus. The central move in causal inference is to distinguish a statistical relation from a claim about what would happen under an intervention.

For this subsection, the working scope is do-operator semantics, mutilated graphs, backdoor and frontdoor criteria, identification rules, and post-identification estimation. The mathematical objects are variables, mechanisms, graphs, interventions, and assumptions. A causal claim is incomplete until all five are visible.

ATE=E[Ydo(X=1)]E[Ydo(X=0)].\operatorname{ATE}=\mathbb{E}[Y \mid \operatorname{do}(X=1)]-\mathbb{E}[Y \mid \operatorname{do}(X=0)].

The formula gives a compact handle on proof intuition from d-separation. It should not be read as a purely algebraic identity. In causal inference, equations encode assumptions about mechanisms, missing variables, and which parts of the world remain stable under intervention.

Causal objectMeaningAI interpretation
VariableQuantity in the causal systemPrompt feature, user action, treatment, tool call, exposure, label, reward
MechanismAssignment that generates a variableData pipeline, recommender policy, human behavior, model routing rule
GraphQualitative causal assumptionsWhat can affect what, and which paths may confound effects
InterventionReplacement of a mechanismA/B rollout, policy switch, prompt template change, retrieval update
CounterfactualUnit-level alternate worldWhat this user or model trace would have done under another action

Three examples of proof intuition from d-separation:

  1. A recommender team wants the causal effect of ranking a document higher, not merely the correlation between rank and clicks.
  2. An LLM platform changes a safety policy and wants to estimate whether refusals changed because of the policy or because user prompts shifted.
  3. A fairness auditor asks whether a proxy feature transmits an impermissible causal path into a model decision.

Two non-examples expose the boundary:

  1. A high predictive coefficient is not a causal effect unless the graph and intervention assumptions justify it.
  2. A plausible narrative produced by a language model is not a counterfactual unless it is grounded in a causal model.

The proof habit for proof intuition from d-separation is to name the graph operation. Conditioning restricts a distribution. Intervention replaces a mechanism. Counterfactual reasoning updates exogenous uncertainty from evidence, changes a mechanism, then predicts.

observed association:      P(Y | X=x)
intervention question:     P(Y | do(X=x))
counterfactual question:   P(Y_x | E=e)
discovery question:        which G could have generated P(V)?

In machine learning, proof intuition from d-separation is valuable because models are often deployed under interventions: ranking changes, policy changes, safety filters, tool-use gates, data collection changes, and human feedback loops. Prediction alone does not tell us which change caused which downstream behavior.

Notebook implementation will use synthetic SCMs and small graphs. This keeps the examples executable while preserving the conceptual split between identification and estimation.

Checklist for using proof intuition from d-separation responsibly:

  • State the causal question before choosing a method.
  • Draw or describe the assumed causal graph.
  • Mark observed, latent, treatment, outcome, and adjustment variables.
  • Separate intervention notation from conditioning notation.
  • Decide whether the query is identifiable before estimating it.
  • Report assumptions that cannot be tested from the observed data alone.
  • Use ML as an estimation aid, not as a substitute for causal design.

This chapter follows the boundary set by Chapter 21. Statistical learning theory controls prediction error under distributional assumptions. Causal inference asks what happens when the distribution changes because something is done.

Modern AI systems make this distinction unavoidable. A foundation model can predict which action historically followed a context, but a decision system needs to know what would happen if it took a different action in that context.

Thus, proof intuition from d-separation is not an abstract philosophical add-on. It is a production and research tool for deciding which model, prompt, policy, feature, or intervention actually changed an outcome.

A final diagnostic question is whether the claim would survive a policy change. If the answer depends only on a historical correlation, it belongs in predictive modeling. If the answer depends on what mechanism is replaced and which paths remain active, it belongs in causal inference.

Diagnostic questionCausal discipline it tests
What is being changed?Intervention target
Which mechanism is replaced?SCM modularity
Which paths transmit the effect?Graph semantics
Which variables are merely observed?Conditioning versus intervention
Which quantities are unobserved?Confounding and counterfactual uncertainty

Skill Check

Test this lesson

Answer 4 quick questions to lock in the lesson and feed your adaptive practice queue.

--
Score
0/4
Answered
Not attempted
Status
1

Which module does this lesson belong to?

2

Which section is covered in this lesson content?

3

Which term is most central to this lesson?

4

What is the best way to use this lesson for real learning?

Your answers save locally first, then sync when account storage is available.
Practice queue