Chapter 4 — Calculus Fundamentals
"Calculus is the science of measuring and harnessing change — and every gradient descent step, every backpropagation pass, every convergence proof in machine learning is calculus in action."
Overview
This chapter builds the single-variable calculus foundation that underpins all continuous mathematics in machine learning. The progression moves from the rigorous notion of a limit (the definition of "approaching"), through the derivative as the instantaneous rate of change, through integration as accumulation, and finally through series as infinite-precision approximation.
Every concept introduced here has a direct ML counterpart: limits underlie gradient definitions and softmax temperature; derivatives are backpropagation; integration is expectation over probability densities; Taylor series are the theoretical foundation of optimizer update rules and attention approximations. Understanding the mathematical substance — not just the mechanics — separates practitioners who can debug and innovate from those who can only apply recipes.
Subsection Map
| # | Subsection | What It Covers | Canonical Topics |
|---|---|---|---|
| 01 | Limits and Continuity | Rigorous foundation of approaching, convergence, and function regularity | ε-δ definition, limit laws, Squeeze Theorem, L'Hôpital's Rule, one-sided limits, IVT, EVT, continuity, discontinuity types, numerical stability near limits |
| 02 | Derivatives and Differentiation | Instantaneous rate of change; differentiation rules; activation function analysis | Derivative definition, power/product/quotient/chain rules, implicit differentiation, higher-order derivatives, activation function derivatives, critical points, numerical differentiation |
| 03 | Integration | Accumulation and area; the Fundamental Theorem; probabilistic expectations | Riemann integral, antiderivatives, FTC parts I & II, substitution, integration by parts, improper integrals, numerical integration, probability density integration |
| 04 | Series and Sequences | Infinite sums, convergence, and function approximation via Taylor/power series | Sequences, convergence tests, power series, radius of convergence, Taylor series, Maclaurin series, error bounds |
Reading Order and Dependencies
01-Limits-and-Continuity (ε-δ, squeeze, IVT — rigorous foundation)
↓
02-Derivatives-and-Differentiation (derivative as a limit; all rules; activation analysis)
↓
03-Integration (antiderivatives; FTC connects integrals to derivatives)
↓
04-Series-and-Sequences (Taylor series uses derivatives; convergence uses limits)
↓
05-Multivariate-Calculus (next chapter — extends §02 chain rule, §03 Fubini, §04 Taylor)
What Belongs Where — Canonical Homes
This table is the authoritative scoping guide. If a topic has a canonical home, every other section must give at most a 1–2 paragraph preview with a forward/backward reference — never a full treatment.
| Topic | Canonical Home | Preview Only In |
|---|---|---|
| ε-δ definition of limits | §01 | — |
| Limit laws (sum, product, quotient, composition) | §01 | — |
| Squeeze Theorem (proof + applications) | §01 | — |
| L'Hôpital's Rule | §01 | — |
| One-sided limits | §01 | — |
| Limits at infinity and infinite limits | §01 | — |
| Fundamental limits: , , | §01 | §02 (used in derivatives) |
| Continuity: definition, three-part condition | §01 | §02 (differentiability implies continuity) |
| Types of discontinuity: removable, jump, essential | §01 | — |
| Intermediate Value Theorem | §01 | §03 (root existence in FTC proof) |
| Extreme Value Theorem | §01 | §02 (applied to critical points) |
| Uniform continuity, Lipschitz continuity | §01 | — |
Numerical stability near limits (expm1, log1p) | §01 | §02 (numerically stable gradients) |
| Derivative definition (limit of difference quotient) | §02 | §01 (brief forward preview only) |
| Power rule, product rule, quotient rule | §02 | — |
| Chain rule (single-variable) | §02 | §05 (multivariable extension) |
| Implicit differentiation | §02 | — |
| Higher-order derivatives, concavity | §02 | §04 (used in Taylor remainder) |
| Activation function derivatives: sigmoid, tanh, ReLU, GELU | §02 | §01 (continuity of activations only) |
| Critical points, local extrema (first/second derivative test) | §02 | §08-Optimization (multivariable extension) |
| Numerical differentiation / finite differences | §02 | §01 (gradient-as-limit preview only) |
| Related rates, linear approximation | §02 | — |
| Antiderivatives and indefinite integrals | §03 | — |
| Riemann sums and definite integral definition | §03 | — |
| Fundamental Theorem of Calculus (both parts) | §03 | §02 (brief forward reference) |
| Integration by substitution (u-substitution) | §03 | — |
| Integration by parts | §03 | — |
| Partial fractions | §03 | — |
| Improper integrals | §03 | — |
| Numerical integration: trapezoid, Simpson's, Monte Carlo | §03 | — |
| Integration in probability: PDF, CDF, expectation | §03 | §06-Probability (uses these results) |
| Sequences: definition, convergence, Cauchy criterion | §04 | §01 (sequential characterisation of limits) |
| Infinite series, partial sums | §04 | — |
| Convergence tests: ratio, root, comparison, integral | §04 | — |
| Power series, radius of convergence | §04 | — |
| Taylor series (full derivation, remainder, convergence) | §04 | §02 (first-order/linear approximation only) |
| Maclaurin series for , , , | §04 | §01 (used to evaluate limits) |
| Taylor series in ML: optimizer analysis, attention approximations | §04 | — |
Overlap Danger Zones
These topics are the most commonly duplicated across sections. Read this before writing any content.
1. Derivative ↔ Limit
- §01 may preview the derivative as a limit of a difference quotient (1–2 paragraphs, forward ref to §02). It must NOT include differentiation rules, worked differentiation examples, or activation function derivatives.
- §02 owns the full treatment of the derivative: definition, rules, applications.
2. Taylor Expansion ↔ Derivative / Limit
- §02 may use first-order Taylor approximation (linear approximation) because it requires only derivatives. It must NOT prove Taylor's theorem or discuss convergence.
- §01 may use the Taylor expansion of informally to evaluate limits (e.g., ). It must NOT derive the series.
- §04 owns the full treatment: derivation of Taylor series, remainder theorem, convergence radius, ML applications.
3. Chain Rule ↔ Backpropagation
- §02 derives the single-variable chain rule and applies it to simple function compositions and backpropagation through scalar computations.
- §05-Multivariate Calculus owns the multivariable chain rule, Jacobians, and the full backpropagation algorithm for vector-valued functions.
- §02 should forward-reference §05 for the general case.
4. Integration ↔ Probability
- §03 covers integration of probability density functions and the definition of expectation as a worked example of improper integration.
- §06-Probability Theory owns the full treatment of random variables, distributions, and probabilistic reasoning.
- §03 should forward-reference §06 for the probabilistic interpretation; §06 should backward-reference §03 for the integration mechanics.
5. Continuity of Activation Functions
- §01 covers continuity of ReLU, GELU, sigmoid as examples of continuity and discontinuity types (at the origin — is the function continuous? is its derivative continuous?).
- §02 covers the derivatives of those same activation functions (sigmoid', ReLU', GELU').
- Do NOT duplicate: §01 discusses continuity only; §02 discusses differentiability and the derivative formula only.
6. Numerical Differentiation
- §01 previews the gradient as a limit (1–2 paragraphs, as motivation for why limits matter for AI).
- §02 owns numerical differentiation: one-sided vs centered finite differences, error order vs , optimal step size, gradient checking implementation.
Forward and Backward Reference Format
When a topic belongs to another section, use this exact format rather than covering it in depth:
Forward reference (full treatment is later):
> **Preview: Taylor Series**
> Every smooth function can be represented as an infinite polynomial:
> $f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}(x-a)^n$.
> The first-order truncation $f(x) \approx f(a) + f'(a)(x-a)$ is the linear approximation
> used throughout optimisation.
>
> → _Full treatment: [Series and Sequences](../04-Series-and-Sequences/notes.md)_
Backward reference (builds on earlier section):
> **Recall:** The derivative $f'(a) = \lim_{h\to 0}[f(a+h)-f(a)]/h$ was defined in
> [Limits and Continuity](../01-Limits-and-Continuity/notes.md#95-gradient-as-a-limit).
> We now develop rules for computing it efficiently.
Key Cross-Chapter Dependencies
From Chapter 3 — Advanced Linear Algebra:
- Matrix derivatives (Jacobians, Hessians) appear in §05-Multivariate Calculus — the present chapter provides the scalar foundation
- Positive definite Hessians (from §07-Positive-Definite-Matrices) appear in §02 second-derivative tests and §08-Optimization
Into Chapter 5 — Multivariate Calculus:
- §01 (limits) → multivariable limits and partial derivative definition
- §02 (chain rule) → multivariable chain rule, Jacobians, backpropagation
- §03 (integration) → double/triple integrals, Fubini's theorem
- §04 (Taylor series) → multivariate Taylor expansion, Hessian as second-order term
Into Chapter 8 — Optimization:
- §02 (critical points) → gradient descent, Newton's method
- §02 (chain rule) → backpropagation
- §04 (Taylor series) → momentum, Adam, trust-region analysis
Into Chapter 6 — Probability Theory:
- §03 (integration) → continuous probability, PDF/CDF, expectation, variance
ML Concept Map
| ML Concept | Calculus Foundation | Section |
|---|---|---|
| Backpropagation | Chain rule applied to computation graph | §02 |
| Gradient descent update | Derivative of loss w.r.t. parameters | §02 |
| Softmax temperature () | Limits at zero | §01 |
| Vanishing gradient | as | §01, §02 |
| Expected loss | Integration over data distribution | §03 |
| KL divergence, entropy | Improper integrals of | §03 |
| Adam optimizer (second moment) | Power series expansion of update rule | §04 |
| Attention kernel approximation | Taylor expansion of | §04 |
| Learning rate convergence (Robbins-Monro) | Series convergence conditions | §01, §04 |
| Numerical gradient checking | Finite difference approximation of limit | §02 |
| Monte Carlo expectation | Numerical integration via sampling | §03 |
| Log-sum-exp stability | Limits near singularities | §01 |
Prerequisites
Before starting this chapter, ensure you are comfortable with:
- Real number system: , absolute value, inequalities — §01-Mathematical-Foundations
- Functions: domain, codomain, composition, inverse — §01-Mathematical-Foundations
- Algebra: polynomial factoring, rational expressions, exponential/logarithmic identities
- Trigonometry: , , and their basic identities
- Matrix operations: matrix multiply, transpose — §02-Linear-Algebra-Basics (for later ML applications)
← Previous Chapter: Advanced Linear Algebra | Next Chapter: Multivariate Calculus →