READMEMath for LLMs

README

Calculus Fundamentals

README

Chapter 4 — Calculus Fundamentals

"Calculus is the science of measuring and harnessing change — and every gradient descent step, every backpropagation pass, every convergence proof in machine learning is calculus in action."

Overview

This chapter builds the single-variable calculus foundation that underpins all continuous mathematics in machine learning. The progression moves from the rigorous notion of a limit (the definition of "approaching"), through the derivative as the instantaneous rate of change, through integration as accumulation, and finally through series as infinite-precision approximation.

Every concept introduced here has a direct ML counterpart: limits underlie gradient definitions and softmax temperature; derivatives are backpropagation; integration is expectation over probability densities; Taylor series are the theoretical foundation of optimizer update rules and attention approximations. Understanding the mathematical substance — not just the mechanics — separates practitioners who can debug and innovate from those who can only apply recipes.


Subsection Map

#SubsectionWhat It CoversCanonical Topics
01Limits and ContinuityRigorous foundation of approaching, convergence, and function regularityε-δ definition, limit laws, Squeeze Theorem, L'Hôpital's Rule, one-sided limits, IVT, EVT, continuity, discontinuity types, numerical stability near limits
02Derivatives and DifferentiationInstantaneous rate of change; differentiation rules; activation function analysisDerivative definition, power/product/quotient/chain rules, implicit differentiation, higher-order derivatives, activation function derivatives, critical points, numerical differentiation
03IntegrationAccumulation and area; the Fundamental Theorem; probabilistic expectationsRiemann integral, antiderivatives, FTC parts I & II, substitution, integration by parts, improper integrals, numerical integration, probability density integration
04Series and SequencesInfinite sums, convergence, and function approximation via Taylor/power seriesSequences, convergence tests, power series, radius of convergence, Taylor series, Maclaurin series, error bounds

Reading Order and Dependencies

01-Limits-and-Continuity        (ε-δ, squeeze, IVT — rigorous foundation)
        ↓
02-Derivatives-and-Differentiation  (derivative as a limit; all rules; activation analysis)
        ↓
03-Integration                  (antiderivatives; FTC connects integrals to derivatives)
        ↓
04-Series-and-Sequences         (Taylor series uses derivatives; convergence uses limits)
        ↓
05-Multivariate-Calculus        (next chapter — extends §02 chain rule, §03 Fubini, §04 Taylor)

What Belongs Where — Canonical Homes

This table is the authoritative scoping guide. If a topic has a canonical home, every other section must give at most a 1–2 paragraph preview with a forward/backward reference — never a full treatment.

TopicCanonical HomePreview Only In
ε-δ definition of limits§01
Limit laws (sum, product, quotient, composition)§01
Squeeze Theorem (proof + applications)§01
L'Hôpital's Rule§01
One-sided limits§01
Limits at infinity and infinite limits§01
Fundamental limits: sin(x)/x\sin(x)/x, (ex1)/x(e^x-1)/x, (1+1/n)n(1+1/n)^n§01§02 (used in derivatives)
Continuity: definition, three-part condition§01§02 (differentiability implies continuity)
Types of discontinuity: removable, jump, essential§01
Intermediate Value Theorem§01§03 (root existence in FTC proof)
Extreme Value Theorem§01§02 (applied to critical points)
Uniform continuity, Lipschitz continuity§01
Numerical stability near limits (expm1, log1p)§01§02 (numerically stable gradients)
Derivative definition (limit of difference quotient)§02§01 (brief forward preview only)
Power rule, product rule, quotient rule§02
Chain rule (single-variable)§02§05 (multivariable extension)
Implicit differentiation§02
Higher-order derivatives, concavity§02§04 (used in Taylor remainder)
Activation function derivatives: sigmoid, tanh, ReLU, GELU§02§01 (continuity of activations only)
Critical points, local extrema (first/second derivative test)§02§08-Optimization (multivariable extension)
Numerical differentiation / finite differences§02§01 (gradient-as-limit preview only)
Related rates, linear approximation§02
Antiderivatives and indefinite integrals§03
Riemann sums and definite integral definition§03
Fundamental Theorem of Calculus (both parts)§03§02 (brief forward reference)
Integration by substitution (u-substitution)§03
Integration by parts§03
Partial fractions§03
Improper integrals§03
Numerical integration: trapezoid, Simpson's, Monte Carlo§03
Integration in probability: PDF, CDF, expectation§03§06-Probability (uses these results)
Sequences: definition, convergence, Cauchy criterion§04§01 (sequential characterisation of limits)
Infinite series, partial sums§04
Convergence tests: ratio, root, comparison, integral§04
Power series, radius of convergence§04
Taylor series (full derivation, remainder, convergence)§04§02 (first-order/linear approximation only)
Maclaurin series for exe^x, sinx\sin x, cosx\cos x, ln(1+x)\ln(1+x)§04§01 (used to evaluate limits)
Taylor series in ML: optimizer analysis, attention approximations§04

Overlap Danger Zones

These topics are the most commonly duplicated across sections. Read this before writing any content.

1. Derivative ↔ Limit

  • §01 may preview the derivative as a limit of a difference quotient (1–2 paragraphs, forward ref to §02). It must NOT include differentiation rules, worked differentiation examples, or activation function derivatives.
  • §02 owns the full treatment of the derivative: definition, rules, applications.

2. Taylor Expansion ↔ Derivative / Limit

  • §02 may use first-order Taylor approximation f(x)f(a)+f(a)(xa)f(x) \approx f(a) + f'(a)(x-a) (linear approximation) because it requires only derivatives. It must NOT prove Taylor's theorem or discuss convergence.
  • §01 may use the Taylor expansion of exe^x informally to evaluate limits (e.g., (ex1)/x(e^x-1)/x). It must NOT derive the series.
  • §04 owns the full treatment: derivation of Taylor series, remainder theorem, convergence radius, ML applications.

3. Chain Rule ↔ Backpropagation

  • §02 derives the single-variable chain rule and applies it to simple function compositions and backpropagation through scalar computations.
  • §05-Multivariate Calculus owns the multivariable chain rule, Jacobians, and the full backpropagation algorithm for vector-valued functions.
  • §02 should forward-reference §05 for the general case.

4. Integration ↔ Probability

  • §03 covers integration of probability density functions and the definition of expectation E[X]=xp(x)dx\mathbb{E}[X] = \int x\, p(x)\, dx as a worked example of improper integration.
  • §06-Probability Theory owns the full treatment of random variables, distributions, and probabilistic reasoning.
  • §03 should forward-reference §06 for the probabilistic interpretation; §06 should backward-reference §03 for the integration mechanics.

5. Continuity of Activation Functions

  • §01 covers continuity of ReLU, GELU, sigmoid as examples of continuity and discontinuity types (at the origin — is the function continuous? is its derivative continuous?).
  • §02 covers the derivatives of those same activation functions (sigmoid', ReLU', GELU').
  • Do NOT duplicate: §01 discusses continuity only; §02 discusses differentiability and the derivative formula only.

6. Numerical Differentiation

  • §01 previews the gradient as a limit (1–2 paragraphs, as motivation for why limits matter for AI).
  • §02 owns numerical differentiation: one-sided vs centered finite differences, error order O(h)O(h) vs O(h2)O(h^2), optimal step size, gradient checking implementation.

Forward and Backward Reference Format

When a topic belongs to another section, use this exact format rather than covering it in depth:

Forward reference (full treatment is later):

> **Preview: Taylor Series**
> Every smooth function can be represented as an infinite polynomial:
> $f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}(x-a)^n$.
> The first-order truncation $f(x) \approx f(a) + f'(a)(x-a)$ is the linear approximation
> used throughout optimisation.
>
> → _Full treatment: [Series and Sequences](../04-Series-and-Sequences/notes.md)_

Backward reference (builds on earlier section):

> **Recall:** The derivative $f'(a) = \lim_{h\to 0}[f(a+h)-f(a)]/h$ was defined in
> [Limits and Continuity](../01-Limits-and-Continuity/notes.md#95-gradient-as-a-limit).
> We now develop rules for computing it efficiently.

Key Cross-Chapter Dependencies

From Chapter 3 — Advanced Linear Algebra:

Into Chapter 5 — Multivariate Calculus:

  • §01 (limits) → multivariable limits and partial derivative definition
  • §02 (chain rule) → multivariable chain rule, Jacobians, backpropagation
  • §03 (integration) → double/triple integrals, Fubini's theorem
  • §04 (Taylor series) → multivariate Taylor expansion, Hessian as second-order term

Into Chapter 8 — Optimization:

  • §02 (critical points) → gradient descent, Newton's method
  • §02 (chain rule) → backpropagation
  • §04 (Taylor series) → momentum, Adam, trust-region analysis

Into Chapter 6 — Probability Theory:

  • §03 (integration) → continuous probability, PDF/CDF, expectation, variance

ML Concept Map

ML ConceptCalculus FoundationSection
BackpropagationChain rule applied to computation graph§02
Gradient descent updateDerivative of loss w.r.t. parameters§02
Softmax temperature (T0T \to 0)Limits at zero§01
Vanishing gradientσ(x)0\sigma'(x) \to 0 as x±x \to \pm\infty§01, §02
Expected loss E[L]\mathbb{E}[\mathcal{L}]Integration over data distribution§03
KL divergence, entropyImproper integrals of plogpp \log p§03
Adam optimizer (second moment)Power series expansion of update rule§04
Attention kernel approximationTaylor expansion of eqke^{qk^\top}§04
Learning rate convergence (Robbins-Monro)Series convergence conditions§01, §04
Numerical gradient checkingFinite difference approximation of limit§02
Monte Carlo expectationNumerical integration via sampling§03
Log-sum-exp stabilityLimits near singularities§01

Prerequisites

Before starting this chapter, ensure you are comfortable with:

  • Real number system: R\mathbb{R}, absolute value, inequalities — §01-Mathematical-Foundations
  • Functions: domain, codomain, composition, inverse — §01-Mathematical-Foundations
  • Algebra: polynomial factoring, rational expressions, exponential/logarithmic identities
  • Trigonometry: sin\sin, cos\cos, tan\tan and their basic identities
  • Matrix operations: matrix multiply, transpose — §02-Linear-Algebra-Basics (for later ML applications)

← Previous Chapter: Advanced Linear Algebra | Next Chapter: Multivariate Calculus →