Part 5

16 min read6 headingsSplit lesson page

Lesson overview | Previous part | Next part

Causal Discovery: Part 5: Functional and Invariant Methods

5. Functional and Invariant Methods

Functional and Invariant Methods develops the part of causal discovery specified by the approved Chapter 22 table of contents. The treatment is causal, not merely predictive: the central objects are mechanisms, interventions, assumptions, and counterfactuals.

5.1 LiNGAM

Lingam belongs to the canonical scope of Causal Discovery. The central move in causal inference is to distinguish a statistical relation from a claim about what would happen under an intervention.

For this subsection, the working scope is constraint-based, score-based, functional, invariant, and optimization-based causal graph discovery with clear assumptions and evaluation metrics. The mathematical objects are variables, mechanisms, graphs, interventions, and assumptions. A causal claim is incomplete until all five are visible.

A_{ij}=1 \iff X_i \to X_j.

The formula gives a compact handle on lingam. It should not be read as a purely algebraic identity. In causal inference, equations encode assumptions about mechanisms, missing variables, and which parts of the world remain stable under intervention.

Causal object	Meaning	AI interpretation
Variable	Quantity in the causal system	Prompt feature, user action, treatment, tool call, exposure, label, reward
Mechanism	Assignment that generates a variable	Data pipeline, recommender policy, human behavior, model routing rule
Graph	Qualitative causal assumptions	What can affect what, and which paths may confound effects
Intervention	Replacement of a mechanism	A/B rollout, policy switch, prompt template change, retrieval update
Counterfactual	Unit-level alternate world	What this user or model trace would have done under another action

Three examples of lingam:

A recommender team wants the causal effect of ranking a document higher, not merely the correlation between rank and clicks.
An LLM platform changes a safety policy and wants to estimate whether refusals changed because of the policy or because user prompts shifted.
A fairness auditor asks whether a proxy feature transmits an impermissible causal path into a model decision.

Two non-examples expose the boundary:

A high predictive coefficient is not a causal effect unless the graph and intervention assumptions justify it.
A plausible narrative produced by a language model is not a counterfactual unless it is grounded in a causal model.

The proof habit for lingam is to name the graph operation. Conditioning restricts a distribution. Intervention replaces a mechanism. Counterfactual reasoning updates exogenous uncertainty from evidence, changes a mechanism, then predicts.

observed association:      P(Y | X=x)
intervention question:     P(Y | do(X=x))
counterfactual question:   P(Y_x | E=e)
discovery question:        which G could have generated P(V)?

In machine learning, lingam is valuable because models are often deployed under interventions: ranking changes, policy changes, safety filters, tool-use gates, data collection changes, and human feedback loops. Prediction alone does not tell us which change caused which downstream behavior.

Notebook implementation will use synthetic SCMs and small graphs. This keeps the examples executable while preserving the conceptual split between identification and estimation.

Checklist for using lingam responsibly:

State the causal question before choosing a method.
Draw or describe the assumed causal graph.
Mark observed, latent, treatment, outcome, and adjustment variables.
Separate intervention notation from conditioning notation.
Decide whether the query is identifiable before estimating it.
Report assumptions that cannot be tested from the observed data alone.
Use ML as an estimation aid, not as a substitute for causal design.

This chapter follows the boundary set by Chapter 21. Statistical learning theory controls prediction error under distributional assumptions. Causal inference asks what happens when the distribution changes because something is done.

Modern AI systems make this distinction unavoidable. A foundation model can predict which action historically followed a context, but a decision system needs to know what would happen if it took a different action in that context.

Thus, lingam is not an abstract philosophical add-on. It is a production and research tool for deciding which model, prompt, policy, feature, or intervention actually changed an outcome.

A final diagnostic question is whether the claim would survive a policy change. If the answer depends only on a historical correlation, it belongs in predictive modeling. If the answer depends on what mechanism is replaced and which paths remain active, it belongs in causal inference.

Diagnostic question	Causal discipline it tests
What is being changed?	Intervention target
Which mechanism is replaced?	SCM modularity
Which paths transmit the effect?	Graph semantics
Which variables are merely observed?	Conditioning versus intervention
Which quantities are unobserved?	Confounding and counterfactual uncertainty

5.2 additive noise models

Additive noise models belongs to the canonical scope of Causal Discovery. The central move in causal inference is to distinguish a statistical relation from a claim about what would happen under an intervention.

h(W)=\operatorname{tr}(e^{W\odot W})-d=0.

The formula gives a compact handle on additive noise models. It should not be read as a purely algebraic identity. In causal inference, equations encode assumptions about mechanisms, missing variables, and which parts of the world remain stable under intervention.

Causal object	Meaning	AI interpretation
Variable	Quantity in the causal system	Prompt feature, user action, treatment, tool call, exposure, label, reward
Mechanism	Assignment that generates a variable	Data pipeline, recommender policy, human behavior, model routing rule
Graph	Qualitative causal assumptions	What can affect what, and which paths may confound effects
Intervention	Replacement of a mechanism	A/B rollout, policy switch, prompt template change, retrieval update
Counterfactual	Unit-level alternate world	What this user or model trace would have done under another action

Three examples of additive noise models:

A recommender team wants the causal effect of ranking a document higher, not merely the correlation between rank and clicks.
An LLM platform changes a safety policy and wants to estimate whether refusals changed because of the policy or because user prompts shifted.
A fairness auditor asks whether a proxy feature transmits an impermissible causal path into a model decision.

Two non-examples expose the boundary:

A high predictive coefficient is not a causal effect unless the graph and intervention assumptions justify it.
A plausible narrative produced by a language model is not a counterfactual unless it is grounded in a causal model.

The proof habit for additive noise models is to name the graph operation. Conditioning restricts a distribution. Intervention replaces a mechanism. Counterfactual reasoning updates exogenous uncertainty from evidence, changes a mechanism, then predicts.

observed association:      P(Y | X=x)
intervention question:     P(Y | do(X=x))
counterfactual question:   P(Y_x | E=e)
discovery question:        which G could have generated P(V)?

In machine learning, additive noise models is valuable because models are often deployed under interventions: ranking changes, policy changes, safety filters, tool-use gates, data collection changes, and human feedback loops. Prediction alone does not tell us which change caused which downstream behavior.

Notebook implementation will use synthetic SCMs and small graphs. This keeps the examples executable while preserving the conceptual split between identification and estimation.

Checklist for using additive noise models responsibly:

State the causal question before choosing a method.
Draw or describe the assumed causal graph.
Mark observed, latent, treatment, outcome, and adjustment variables.
Separate intervention notation from conditioning notation.
Decide whether the query is identifiable before estimating it.
Report assumptions that cannot be tested from the observed data alone.
Use ML as an estimation aid, not as a substitute for causal design.

Thus, additive noise models is not an abstract philosophical add-on. It is a production and research tool for deciding which model, prompt, policy, feature, or intervention actually changed an outcome.

Diagnostic question	Causal discipline it tests
What is being changed?	Intervention target
Which mechanism is replaced?	SCM modularity
Which paths transmit the effect?	Graph semantics
Which variables are merely observed?	Conditioning versus intervention
Which quantities are unobserved?	Confounding and counterfactual uncertainty

5.3 invariant causal prediction

Invariant causal prediction belongs to the canonical scope of Causal Discovery. The central move in causal inference is to distinguish a statistical relation from a claim about what would happen under an intervention.

\operatorname{SHD}(G,\widehat{G})=\#\{\text{edge additions, deletions, reversals}\}.

The formula gives a compact handle on invariant causal prediction. It should not be read as a purely algebraic identity. In causal inference, equations encode assumptions about mechanisms, missing variables, and which parts of the world remain stable under intervention.

Causal object	Meaning	AI interpretation
Variable	Quantity in the causal system	Prompt feature, user action, treatment, tool call, exposure, label, reward
Mechanism	Assignment that generates a variable	Data pipeline, recommender policy, human behavior, model routing rule
Graph	Qualitative causal assumptions	What can affect what, and which paths may confound effects
Intervention	Replacement of a mechanism	A/B rollout, policy switch, prompt template change, retrieval update
Counterfactual	Unit-level alternate world	What this user or model trace would have done under another action

Three examples of invariant causal prediction:

A recommender team wants the causal effect of ranking a document higher, not merely the correlation between rank and clicks.
An LLM platform changes a safety policy and wants to estimate whether refusals changed because of the policy or because user prompts shifted.
A fairness auditor asks whether a proxy feature transmits an impermissible causal path into a model decision.

Two non-examples expose the boundary:

A high predictive coefficient is not a causal effect unless the graph and intervention assumptions justify it.
A plausible narrative produced by a language model is not a counterfactual unless it is grounded in a causal model.

The proof habit for invariant causal prediction is to name the graph operation. Conditioning restricts a distribution. Intervention replaces a mechanism. Counterfactual reasoning updates exogenous uncertainty from evidence, changes a mechanism, then predicts.

observed association:      P(Y | X=x)
intervention question:     P(Y | do(X=x))
counterfactual question:   P(Y_x | E=e)
discovery question:        which G could have generated P(V)?

In machine learning, invariant causal prediction is valuable because models are often deployed under interventions: ranking changes, policy changes, safety filters, tool-use gates, data collection changes, and human feedback loops. Prediction alone does not tell us which change caused which downstream behavior.

Notebook implementation will use synthetic SCMs and small graphs. This keeps the examples executable while preserving the conceptual split between identification and estimation.

Checklist for using invariant causal prediction responsibly:

State the causal question before choosing a method.
Draw or describe the assumed causal graph.
Mark observed, latent, treatment, outcome, and adjustment variables.
Separate intervention notation from conditioning notation.
Decide whether the query is identifiable before estimating it.
Report assumptions that cannot be tested from the observed data alone.
Use ML as an estimation aid, not as a substitute for causal design.

Thus, invariant causal prediction is not an abstract philosophical add-on. It is a production and research tool for deciding which model, prompt, policy, feature, or intervention actually changed an outcome.

Diagnostic question	Causal discipline it tests
What is being changed?	Intervention target
Which mechanism is replaced?	SCM modularity
Which paths transmit the effect?	Graph semantics
Which variables are merely observed?	Conditioning versus intervention
Which quantities are unobserved?	Confounding and counterfactual uncertainty

5.4 causal discovery across environments

Causal discovery across environments belongs to the canonical scope of Causal Discovery. The central move in causal inference is to distinguish a statistical relation from a claim about what would happen under an intervention.

X_j=f_j(\operatorname{pa}_j)+N_j,\qquad N_j \perp\!\!\!\perp \operatorname{pa}_j.

The formula gives a compact handle on causal discovery across environments. It should not be read as a purely algebraic identity. In causal inference, equations encode assumptions about mechanisms, missing variables, and which parts of the world remain stable under intervention.

Causal object	Meaning	AI interpretation
Variable	Quantity in the causal system	Prompt feature, user action, treatment, tool call, exposure, label, reward
Mechanism	Assignment that generates a variable	Data pipeline, recommender policy, human behavior, model routing rule
Graph	Qualitative causal assumptions	What can affect what, and which paths may confound effects
Intervention	Replacement of a mechanism	A/B rollout, policy switch, prompt template change, retrieval update
Counterfactual	Unit-level alternate world	What this user or model trace would have done under another action

Three examples of causal discovery across environments:

A recommender team wants the causal effect of ranking a document higher, not merely the correlation between rank and clicks.
An LLM platform changes a safety policy and wants to estimate whether refusals changed because of the policy or because user prompts shifted.
A fairness auditor asks whether a proxy feature transmits an impermissible causal path into a model decision.

Two non-examples expose the boundary:

A high predictive coefficient is not a causal effect unless the graph and intervention assumptions justify it.
A plausible narrative produced by a language model is not a counterfactual unless it is grounded in a causal model.

The proof habit for causal discovery across environments is to name the graph operation. Conditioning restricts a distribution. Intervention replaces a mechanism. Counterfactual reasoning updates exogenous uncertainty from evidence, changes a mechanism, then predicts.

observed association:      P(Y | X=x)
intervention question:     P(Y | do(X=x))
counterfactual question:   P(Y_x | E=e)
discovery question:        which G could have generated P(V)?

In machine learning, causal discovery across environments is valuable because models are often deployed under interventions: ranking changes, policy changes, safety filters, tool-use gates, data collection changes, and human feedback loops. Prediction alone does not tell us which change caused which downstream behavior.

Notebook implementation will use synthetic SCMs and small graphs. This keeps the examples executable while preserving the conceptual split between identification and estimation.

Checklist for using causal discovery across environments responsibly:

State the causal question before choosing a method.
Draw or describe the assumed causal graph.
Mark observed, latent, treatment, outcome, and adjustment variables.
Separate intervention notation from conditioning notation.
Decide whether the query is identifiable before estimating it.
Report assumptions that cannot be tested from the observed data alone.
Use ML as an estimation aid, not as a substitute for causal design.

Thus, causal discovery across environments is not an abstract philosophical add-on. It is a production and research tool for deciding which model, prompt, policy, feature, or intervention actually changed an outcome.

Diagnostic question	Causal discipline it tests
What is being changed?	Intervention target
Which mechanism is replaced?	SCM modularity
Which paths transmit the effect?	Graph semantics
Which variables are merely observed?	Conditioning versus intervention
Which quantities are unobserved?	Confounding and counterfactual uncertainty

5.5 time-series causal discovery preview

Time-series causal discovery preview belongs to the canonical scope of Causal Discovery. The central move in causal inference is to distinguish a statistical relation from a claim about what would happen under an intervention.

A_{ij}=1 \iff X_i \to X_j.

The formula gives a compact handle on time-series causal discovery preview. It should not be read as a purely algebraic identity. In causal inference, equations encode assumptions about mechanisms, missing variables, and which parts of the world remain stable under intervention.

Causal object	Meaning	AI interpretation
Variable	Quantity in the causal system	Prompt feature, user action, treatment, tool call, exposure, label, reward
Mechanism	Assignment that generates a variable	Data pipeline, recommender policy, human behavior, model routing rule
Graph	Qualitative causal assumptions	What can affect what, and which paths may confound effects
Intervention	Replacement of a mechanism	A/B rollout, policy switch, prompt template change, retrieval update
Counterfactual	Unit-level alternate world	What this user or model trace would have done under another action

Three examples of time-series causal discovery preview:

A recommender team wants the causal effect of ranking a document higher, not merely the correlation between rank and clicks.
An LLM platform changes a safety policy and wants to estimate whether refusals changed because of the policy or because user prompts shifted.
A fairness auditor asks whether a proxy feature transmits an impermissible causal path into a model decision.

Two non-examples expose the boundary:

A high predictive coefficient is not a causal effect unless the graph and intervention assumptions justify it.
A plausible narrative produced by a language model is not a counterfactual unless it is grounded in a causal model.

The proof habit for time-series causal discovery preview is to name the graph operation. Conditioning restricts a distribution. Intervention replaces a mechanism. Counterfactual reasoning updates exogenous uncertainty from evidence, changes a mechanism, then predicts.

observed association:      P(Y | X=x)
intervention question:     P(Y | do(X=x))
counterfactual question:   P(Y_x | E=e)
discovery question:        which G could have generated P(V)?

In machine learning, time-series causal discovery preview is valuable because models are often deployed under interventions: ranking changes, policy changes, safety filters, tool-use gates, data collection changes, and human feedback loops. Prediction alone does not tell us which change caused which downstream behavior.

Notebook implementation will use synthetic SCMs and small graphs. This keeps the examples executable while preserving the conceptual split between identification and estimation.

Checklist for using time-series causal discovery preview responsibly:

State the causal question before choosing a method.
Draw or describe the assumed causal graph.
Mark observed, latent, treatment, outcome, and adjustment variables.
Separate intervention notation from conditioning notation.
Decide whether the query is identifiable before estimating it.
Report assumptions that cannot be tested from the observed data alone.
Use ML as an estimation aid, not as a substitute for causal design.

Thus, time-series causal discovery preview is not an abstract philosophical add-on. It is a production and research tool for deciding which model, prompt, policy, feature, or intervention actually changed an outcome.

Diagnostic question	Causal discipline it tests
What is being changed?	Intervention target
Which mechanism is replaced?	SCM modularity
Which paths transmit the effect?	Graph semantics
Which variables are merely observed?	Conditioning versus intervention
Which quantities are unobserved?	Confounding and counterfactual uncertainty

Causal Discovery: Part 5 - Functional And Invariant Methods

Causal Discovery: Part 5: Functional and Invariant Methods

5. Functional and Invariant Methods

5.1 LiNGAM

5.2 additive noise models

5.3 invariant causal prediction

5.4 causal discovery across environments

5.5 time-series causal discovery preview

Test this lesson

Which module does this lesson belong to?

Which section is covered in this lesson content?

Which term is most central to this lesson?

What is the best way to use this lesson for real learning?