Lesson overview | Previous part | Next part
Lebesgue Integration: Part 2: Formal Definitions
2. Formal Definitions
Formal Definitions develops the part of lebesgue integration specified by the approved Chapter 24 table of contents. The treatment is measure-theoretic and AI-facing: every concept is tied to probability, expectation, density, or learning systems.
2.1 Measures and nonnegative simple functions
Measures and nonnegative simple functions belongs to the canonical scope of Lebesgue Integration. Here the point is not to repeat introductory probability, but to expose the measurable structure that makes the probability statement valid.
Working scope for this subsection: simple functions, nonnegative integrals, signed integrals, convergence theorems, almost-everywhere equality, and ML expectations. The mathematical habit is to name the space, the sigma algebra, the measure, and the map before writing probabilities or expectations.
Operational definition.
Lebesgue integration first integrates simple measurable approximations, then extends by monotone limits and signed decomposition.
Worked reading.
For , the integral is . This is weighted averaging over measurable level sets.
| Object | Measure-theoretic role | AI interpretation |
|---|---|---|
| Underlying outcome space | Hidden randomness behind data, sampling, initialization, or generation | |
| Measurable events | Observable filters, logged events, queryable dataset subsets | |
| or | Measure or probability | Data-generating law, empirical measure, proposal distribution, policy law |
| Measurable map | Feature extractor, tokenizer, embedding, model score, random variable | |
| Weighted aggregation | Expected loss, calibration metric, ELBO term, importance-weighted estimate |
Three examples of measures and nonnegative simple functions:
- Expected classification loss over a data distribution.
- Integral of a stepwise calibration curve.
- Mean reward under a policy distribution.
Two non-examples clarify the boundary:
- A nonmeasurable function.
- A function with infinite positive and negative parts both present.
Proof or verification habit for measures and nonnegative simple functions:
The construction proves consistency by refining simple-function representations and using monotonicity.
set question -> is the subset measurable?
function question -> are inverse images measurable?
integral question -> is the function measurable and integrable?
density question -> is absolute continuity satisfied?
ML question -> which measure defines the population claim?
In AI systems, measures and nonnegative simple functions matters because probability language is constantly compressed into informal notation. Measure theory expands the notation so support, observability, null sets, and convergence assumptions are visible.
Expected loss is not a different object from integration; it is the Lebesgue integral of a loss random variable.
Practical checklist:
- Name the measurable space before naming the probability.
- Identify whether the object is a set, function, measure, distribution, or derivative of measures.
- Check whether equality is pointwise, almost everywhere, or distributional.
- Check whether limits are moved through integrals and which theorem justifies the move.
- For density ratios, check support and absolute continuity before dividing.
- For ML claims, distinguish population measure, empirical measure, model measure, and proposal measure.
Local diagnostic: Verify measurability and finite integral of positive and negative parts.
The notebook version of this subsection uses finite spaces, step functions, empirical measures, or simple density ratios. These toy cases keep the objects visible while preserving the exact logic used in continuous ML models.
The learner should leave this subsection able to translate between the compact ML notation and the full measure-theoretic statement.
| Compact ML notation | Expanded measure-theoretic reading |
|---|---|
| A random element has law on a measurable space | |
| Lebesgue integral of measurable loss under | |
| Density with respect to a specified base measure | |
| Radon-Nikodym derivative when domination holds | |
| train/test shift | Two probability measures on a shared measurable space |
A useful way to study this subsection is to keep three layers separate:
- Semantic layer: what real-world question is being asked?
- Measurable layer: which event, function, or measure represents that question?
- Computational layer: which sum, integral, sample average, or ratio estimates it?
For example, the semantic question may be whether a guardrail fails on a class of prompts. The measurable layer is an event in the prompt space. The computational layer is an empirical estimate under a validation or red-team distribution. Mixing these layers is how many probability arguments become ambiguous.
The same discipline applies to generative models. A generator is a measurable transformation of latent randomness. The generated distribution is the pushforward measure. A likelihood, density, or divergence is only meaningful after the target space, base measure, and support relation are clear.
When reading ML papers, silently expand phrases like "sample from the model," "take expectation over data," and "density ratio" into this measure-theoretic checklist. This turns informal notation into a statement that can be checked.
| Reading move | Question to ask |
|---|---|
| "sample" | From which probability measure? |
| "event" | Is it in the sigma algebra? |
| "feature" | Is the feature map measurable? |
| "expectation" | Is the integrand integrable? |
| "density" | With respect to which base measure? |
| "ratio" | Does absolute continuity hold? |
This is the level of precision needed for high-stakes evaluation, off-policy learning, variational inference, and theoretical generalization arguments.
A final question to ask is whether the claim would still be meaningful if the dataset were infinite, the model output lived in a function space, or the event being queried were defined by a limiting process. Measure theory is what keeps the answer honest.
2.2 Integral of simple functions
Integral of simple functions belongs to the canonical scope of Lebesgue Integration. Here the point is not to repeat introductory probability, but to expose the measurable structure that makes the probability statement valid.
Working scope for this subsection: simple functions, nonnegative integrals, signed integrals, convergence theorems, almost-everywhere equality, and ML expectations. The mathematical habit is to name the space, the sigma algebra, the measure, and the map before writing probabilities or expectations.
Operational definition.
Lebesgue integration first integrates simple measurable approximations, then extends by monotone limits and signed decomposition.
Worked reading.
For , the integral is . This is weighted averaging over measurable level sets.
| Object | Measure-theoretic role | AI interpretation |
|---|---|---|
| Underlying outcome space | Hidden randomness behind data, sampling, initialization, or generation | |
| Measurable events | Observable filters, logged events, queryable dataset subsets | |
| or | Measure or probability | Data-generating law, empirical measure, proposal distribution, policy law |
| Measurable map | Feature extractor, tokenizer, embedding, model score, random variable | |
| Weighted aggregation | Expected loss, calibration metric, ELBO term, importance-weighted estimate |
Three examples of integral of simple functions:
- Expected classification loss over a data distribution.
- Integral of a stepwise calibration curve.
- Mean reward under a policy distribution.
Two non-examples clarify the boundary:
- A nonmeasurable function.
- A function with infinite positive and negative parts both present.
Proof or verification habit for integral of simple functions:
The construction proves consistency by refining simple-function representations and using monotonicity.
set question -> is the subset measurable?
function question -> are inverse images measurable?
integral question -> is the function measurable and integrable?
density question -> is absolute continuity satisfied?
ML question -> which measure defines the population claim?
In AI systems, integral of simple functions matters because probability language is constantly compressed into informal notation. Measure theory expands the notation so support, observability, null sets, and convergence assumptions are visible.
Expected loss is not a different object from integration; it is the Lebesgue integral of a loss random variable.
Practical checklist:
- Name the measurable space before naming the probability.
- Identify whether the object is a set, function, measure, distribution, or derivative of measures.
- Check whether equality is pointwise, almost everywhere, or distributional.
- Check whether limits are moved through integrals and which theorem justifies the move.
- For density ratios, check support and absolute continuity before dividing.
- For ML claims, distinguish population measure, empirical measure, model measure, and proposal measure.
Local diagnostic: Verify measurability and finite integral of positive and negative parts.
The notebook version of this subsection uses finite spaces, step functions, empirical measures, or simple density ratios. These toy cases keep the objects visible while preserving the exact logic used in continuous ML models.
The learner should leave this subsection able to translate between the compact ML notation and the full measure-theoretic statement.
| Compact ML notation | Expanded measure-theoretic reading |
|---|---|
| A random element has law on a measurable space | |
| Lebesgue integral of measurable loss under | |
| Density with respect to a specified base measure | |
| Radon-Nikodym derivative when domination holds | |
| train/test shift | Two probability measures on a shared measurable space |
A useful way to study this subsection is to keep three layers separate:
- Semantic layer: what real-world question is being asked?
- Measurable layer: which event, function, or measure represents that question?
- Computational layer: which sum, integral, sample average, or ratio estimates it?
For example, the semantic question may be whether a guardrail fails on a class of prompts. The measurable layer is an event in the prompt space. The computational layer is an empirical estimate under a validation or red-team distribution. Mixing these layers is how many probability arguments become ambiguous.
The same discipline applies to generative models. A generator is a measurable transformation of latent randomness. The generated distribution is the pushforward measure. A likelihood, density, or divergence is only meaningful after the target space, base measure, and support relation are clear.
When reading ML papers, silently expand phrases like "sample from the model," "take expectation over data," and "density ratio" into this measure-theoretic checklist. This turns informal notation into a statement that can be checked.
| Reading move | Question to ask |
|---|---|
| "sample" | From which probability measure? |
| "event" | Is it in the sigma algebra? |
| "feature" | Is the feature map measurable? |
| "expectation" | Is the integrand integrable? |
| "density" | With respect to which base measure? |
| "ratio" | Does absolute continuity hold? |
This is the level of precision needed for high-stakes evaluation, off-policy learning, variational inference, and theoretical generalization arguments.
A final question to ask is whether the claim would still be meaningful if the dataset were infinite, the model output lived in a function space, or the event being queried were defined by a limiting process. Measure theory is what keeps the answer honest.
2.3 Nonnegative measurable functions
Nonnegative measurable functions belongs to the canonical scope of Lebesgue Integration. Here the point is not to repeat introductory probability, but to expose the measurable structure that makes the probability statement valid.
Working scope for this subsection: simple functions, nonnegative integrals, signed integrals, convergence theorems, almost-everywhere equality, and ML expectations. The mathematical habit is to name the space, the sigma algebra, the measure, and the map before writing probabilities or expectations.
Operational definition.
Lebesgue integration first integrates simple measurable approximations, then extends by monotone limits and signed decomposition.
Worked reading.
For , the integral is . This is weighted averaging over measurable level sets.
| Object | Measure-theoretic role | AI interpretation |
|---|---|---|
| Underlying outcome space | Hidden randomness behind data, sampling, initialization, or generation | |
| Measurable events | Observable filters, logged events, queryable dataset subsets | |
| or | Measure or probability | Data-generating law, empirical measure, proposal distribution, policy law |
| Measurable map | Feature extractor, tokenizer, embedding, model score, random variable | |
| Weighted aggregation | Expected loss, calibration metric, ELBO term, importance-weighted estimate |
Three examples of nonnegative measurable functions:
- Expected classification loss over a data distribution.
- Integral of a stepwise calibration curve.
- Mean reward under a policy distribution.
Two non-examples clarify the boundary:
- A nonmeasurable function.
- A function with infinite positive and negative parts both present.
Proof or verification habit for nonnegative measurable functions:
The construction proves consistency by refining simple-function representations and using monotonicity.
set question -> is the subset measurable?
function question -> are inverse images measurable?
integral question -> is the function measurable and integrable?
density question -> is absolute continuity satisfied?
ML question -> which measure defines the population claim?
In AI systems, nonnegative measurable functions matters because probability language is constantly compressed into informal notation. Measure theory expands the notation so support, observability, null sets, and convergence assumptions are visible.
Expected loss is not a different object from integration; it is the Lebesgue integral of a loss random variable.
Practical checklist:
- Name the measurable space before naming the probability.
- Identify whether the object is a set, function, measure, distribution, or derivative of measures.
- Check whether equality is pointwise, almost everywhere, or distributional.
- Check whether limits are moved through integrals and which theorem justifies the move.
- For density ratios, check support and absolute continuity before dividing.
- For ML claims, distinguish population measure, empirical measure, model measure, and proposal measure.
Local diagnostic: Verify measurability and finite integral of positive and negative parts.
The notebook version of this subsection uses finite spaces, step functions, empirical measures, or simple density ratios. These toy cases keep the objects visible while preserving the exact logic used in continuous ML models.
The learner should leave this subsection able to translate between the compact ML notation and the full measure-theoretic statement.
| Compact ML notation | Expanded measure-theoretic reading |
|---|---|
| A random element has law on a measurable space | |
| Lebesgue integral of measurable loss under | |
| Density with respect to a specified base measure | |
| Radon-Nikodym derivative when domination holds | |
| train/test shift | Two probability measures on a shared measurable space |
A useful way to study this subsection is to keep three layers separate:
- Semantic layer: what real-world question is being asked?
- Measurable layer: which event, function, or measure represents that question?
- Computational layer: which sum, integral, sample average, or ratio estimates it?
For example, the semantic question may be whether a guardrail fails on a class of prompts. The measurable layer is an event in the prompt space. The computational layer is an empirical estimate under a validation or red-team distribution. Mixing these layers is how many probability arguments become ambiguous.
The same discipline applies to generative models. A generator is a measurable transformation of latent randomness. The generated distribution is the pushforward measure. A likelihood, density, or divergence is only meaningful after the target space, base measure, and support relation are clear.
When reading ML papers, silently expand phrases like "sample from the model," "take expectation over data," and "density ratio" into this measure-theoretic checklist. This turns informal notation into a statement that can be checked.
| Reading move | Question to ask |
|---|---|
| "sample" | From which probability measure? |
| "event" | Is it in the sigma algebra? |
| "feature" | Is the feature map measurable? |
| "expectation" | Is the integrand integrable? |
| "density" | With respect to which base measure? |
| "ratio" | Does absolute continuity hold? |
This is the level of precision needed for high-stakes evaluation, off-policy learning, variational inference, and theoretical generalization arguments.
A final question to ask is whether the claim would still be meaningful if the dataset were infinite, the model output lived in a function space, or the event being queried were defined by a limiting process. Measure theory is what keeps the answer honest.
2.4 Signed functions via positive and negative parts
Signed functions via positive and negative parts belongs to the canonical scope of Lebesgue Integration. Here the point is not to repeat introductory probability, but to expose the measurable structure that makes the probability statement valid.
Working scope for this subsection: simple functions, nonnegative integrals, signed integrals, convergence theorems, almost-everywhere equality, and ML expectations. The mathematical habit is to name the space, the sigma algebra, the measure, and the map before writing probabilities or expectations.
Operational definition.
Lebesgue integration first integrates simple measurable approximations, then extends by monotone limits and signed decomposition.
Worked reading.
For , the integral is . This is weighted averaging over measurable level sets.
| Object | Measure-theoretic role | AI interpretation |
|---|---|---|
| Underlying outcome space | Hidden randomness behind data, sampling, initialization, or generation | |
| Measurable events | Observable filters, logged events, queryable dataset subsets | |
| or | Measure or probability | Data-generating law, empirical measure, proposal distribution, policy law |
| Measurable map | Feature extractor, tokenizer, embedding, model score, random variable | |
| Weighted aggregation | Expected loss, calibration metric, ELBO term, importance-weighted estimate |
Three examples of signed functions via positive and negative parts:
- Expected classification loss over a data distribution.
- Integral of a stepwise calibration curve.
- Mean reward under a policy distribution.
Two non-examples clarify the boundary:
- A nonmeasurable function.
- A function with infinite positive and negative parts both present.
Proof or verification habit for signed functions via positive and negative parts:
The construction proves consistency by refining simple-function representations and using monotonicity.
set question -> is the subset measurable?
function question -> are inverse images measurable?
integral question -> is the function measurable and integrable?
density question -> is absolute continuity satisfied?
ML question -> which measure defines the population claim?
In AI systems, signed functions via positive and negative parts matters because probability language is constantly compressed into informal notation. Measure theory expands the notation so support, observability, null sets, and convergence assumptions are visible.
Expected loss is not a different object from integration; it is the Lebesgue integral of a loss random variable.
Practical checklist:
- Name the measurable space before naming the probability.
- Identify whether the object is a set, function, measure, distribution, or derivative of measures.
- Check whether equality is pointwise, almost everywhere, or distributional.
- Check whether limits are moved through integrals and which theorem justifies the move.
- For density ratios, check support and absolute continuity before dividing.
- For ML claims, distinguish population measure, empirical measure, model measure, and proposal measure.
Local diagnostic: Verify measurability and finite integral of positive and negative parts.
The notebook version of this subsection uses finite spaces, step functions, empirical measures, or simple density ratios. These toy cases keep the objects visible while preserving the exact logic used in continuous ML models.
The learner should leave this subsection able to translate between the compact ML notation and the full measure-theoretic statement.
| Compact ML notation | Expanded measure-theoretic reading |
|---|---|
| A random element has law on a measurable space | |
| Lebesgue integral of measurable loss under | |
| Density with respect to a specified base measure | |
| Radon-Nikodym derivative when domination holds | |
| train/test shift | Two probability measures on a shared measurable space |
A useful way to study this subsection is to keep three layers separate:
- Semantic layer: what real-world question is being asked?
- Measurable layer: which event, function, or measure represents that question?
- Computational layer: which sum, integral, sample average, or ratio estimates it?
For example, the semantic question may be whether a guardrail fails on a class of prompts. The measurable layer is an event in the prompt space. The computational layer is an empirical estimate under a validation or red-team distribution. Mixing these layers is how many probability arguments become ambiguous.
The same discipline applies to generative models. A generator is a measurable transformation of latent randomness. The generated distribution is the pushforward measure. A likelihood, density, or divergence is only meaningful after the target space, base measure, and support relation are clear.
When reading ML papers, silently expand phrases like "sample from the model," "take expectation over data," and "density ratio" into this measure-theoretic checklist. This turns informal notation into a statement that can be checked.
| Reading move | Question to ask |
|---|---|
| "sample" | From which probability measure? |
| "event" | Is it in the sigma algebra? |
| "feature" | Is the feature map measurable? |
| "expectation" | Is the integrand integrable? |
| "density" | With respect to which base measure? |
| "ratio" | Does absolute continuity hold? |
This is the level of precision needed for high-stakes evaluation, off-policy learning, variational inference, and theoretical generalization arguments.
A final question to ask is whether the claim would still be meaningful if the dataset were infinite, the model output lived in a function space, or the event being queried were defined by a limiting process. Measure theory is what keeps the answer honest.
2.5 Integrability and
Integrability and belongs to the canonical scope of Lebesgue Integration. Here the point is not to repeat introductory probability, but to expose the measurable structure that makes the probability statement valid.
Working scope for this subsection: simple functions, nonnegative integrals, signed integrals, convergence theorems, almost-everywhere equality, and ML expectations. The mathematical habit is to name the space, the sigma algebra, the measure, and the map before writing probabilities or expectations.
Operational definition.
Lebesgue integration first integrates simple measurable approximations, then extends by monotone limits and signed decomposition.
Worked reading.
For , the integral is . This is weighted averaging over measurable level sets.
| Object | Measure-theoretic role | AI interpretation |
|---|---|---|
| Underlying outcome space | Hidden randomness behind data, sampling, initialization, or generation | |
| Measurable events | Observable filters, logged events, queryable dataset subsets | |
| or | Measure or probability | Data-generating law, empirical measure, proposal distribution, policy law |
| Measurable map | Feature extractor, tokenizer, embedding, model score, random variable | |
| Weighted aggregation | Expected loss, calibration metric, ELBO term, importance-weighted estimate |
Three examples of integrability and :
- Expected classification loss over a data distribution.
- Integral of a stepwise calibration curve.
- Mean reward under a policy distribution.
Two non-examples clarify the boundary:
- A nonmeasurable function.
- A function with infinite positive and negative parts both present.
Proof or verification habit for integrability and :
The construction proves consistency by refining simple-function representations and using monotonicity.
set question -> is the subset measurable?
function question -> are inverse images measurable?
integral question -> is the function measurable and integrable?
density question -> is absolute continuity satisfied?
ML question -> which measure defines the population claim?
In AI systems, integrability and matters because probability language is constantly compressed into informal notation. Measure theory expands the notation so support, observability, null sets, and convergence assumptions are visible.
Expected loss is not a different object from integration; it is the Lebesgue integral of a loss random variable.
Practical checklist:
- Name the measurable space before naming the probability.
- Identify whether the object is a set, function, measure, distribution, or derivative of measures.
- Check whether equality is pointwise, almost everywhere, or distributional.
- Check whether limits are moved through integrals and which theorem justifies the move.
- For density ratios, check support and absolute continuity before dividing.
- For ML claims, distinguish population measure, empirical measure, model measure, and proposal measure.
Local diagnostic: Verify measurability and finite integral of positive and negative parts.
The notebook version of this subsection uses finite spaces, step functions, empirical measures, or simple density ratios. These toy cases keep the objects visible while preserving the exact logic used in continuous ML models.
The learner should leave this subsection able to translate between the compact ML notation and the full measure-theoretic statement.
| Compact ML notation | Expanded measure-theoretic reading |
|---|---|
| A random element has law on a measurable space | |
| Lebesgue integral of measurable loss under | |
| Density with respect to a specified base measure | |
| Radon-Nikodym derivative when domination holds | |
| train/test shift | Two probability measures on a shared measurable space |
A useful way to study this subsection is to keep three layers separate:
- Semantic layer: what real-world question is being asked?
- Measurable layer: which event, function, or measure represents that question?
- Computational layer: which sum, integral, sample average, or ratio estimates it?
For example, the semantic question may be whether a guardrail fails on a class of prompts. The measurable layer is an event in the prompt space. The computational layer is an empirical estimate under a validation or red-team distribution. Mixing these layers is how many probability arguments become ambiguous.
The same discipline applies to generative models. A generator is a measurable transformation of latent randomness. The generated distribution is the pushforward measure. A likelihood, density, or divergence is only meaningful after the target space, base measure, and support relation are clear.
When reading ML papers, silently expand phrases like "sample from the model," "take expectation over data," and "density ratio" into this measure-theoretic checklist. This turns informal notation into a statement that can be checked.
| Reading move | Question to ask |
|---|---|
| "sample" | From which probability measure? |
| "event" | Is it in the sigma algebra? |
| "feature" | Is the feature map measurable? |
| "expectation" | Is the integrand integrable? |
| "density" | With respect to which base measure? |
| "ratio" | Does absolute continuity hold? |
This is the level of precision needed for high-stakes evaluation, off-policy learning, variational inference, and theoretical generalization arguments.
A final question to ask is whether the claim would still be meaningful if the dataset were infinite, the model output lived in a function space, or the event being queried were defined by a limiting process. Measure theory is what keeps the answer honest.