All Courses
HOME

Mathematics for AI / ML / LLM

The Complete Math Foundation You Need to Master AI

A structured, comprehensive curriculum covering 25 domains of mathematics essential for AI, Machine Learning, and Large Language Models โ€” from foundational concepts to cutting-edge research.

Get Started ยท Chapters ยท Resources


Why This Course?

Most AI/ML learners hit the math wall โ€” papers full of symbols that feel alien, optimization steps that seem like magic, and model architectures that assume deep mathematical fluency.

This course bridges that gap with a learn-by-doing approach:

  • Structured path from high school math to research-level topics
  • Notes โ†’ Theory โ†’ Exercises flow for every topic
  • Interactive Jupyter notebooks with visualizations, not just formulas
  • Real ML connections โ€” every concept links to practical AI applications
  • Self-contained โ€” no prerequisites beyond basic algebra

"The math you need depends on what you're building โ€” this course helps you find exactly that."


๐Ÿš€ Learning Flow

Each topic in the curriculum follows a structured 3-step learning flow designed to build deep intuition and practical skill:

๐Ÿ“– notes.md          โ†’ Read the concepts, formulas, and intuition
๐Ÿ”ฌ theory.ipynb      โ†’ Explore interactive python/jupyter code demonstrations
โœ๏ธ exercises.ipynb   โ†’ Solve practice problems to test your understanding

Use the left sidebar navigation to explore topics, or follow the Learning Roadmap below.

๐Ÿ—บ๏ธ Learning Roadmap

The curriculum covers 25 domains organized in 8 phases. Each phase builds on the previous one. Topics marked with โ˜… are critical for ML/AI.

  START HERE
      โ”‚
      โ–ผ
 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚ Phase 1  โ”‚ โ”€โ”€โ–ถ โ”‚   Phase 2   โ”‚ โ”€โ”€โ–ถ โ”‚ Phase 3  โ”‚ โ”€โ”€โ–ถ โ”‚   Phase 4    โ”‚
 โ”‚ Core     โ”‚     โ”‚ Probability โ”‚     โ”‚ Learning โ”‚     โ”‚ Deeper       โ”‚
 โ”‚ Math     โ”‚     โ”‚ & Stats     โ”‚     โ”‚ Engines  โ”‚     โ”‚ Theory       โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                              โ”‚
      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
      โ–ผ
 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚ Phase 5  โ”‚ โ”€โ”€โ–ถ โ”‚ Phase 6  โ”‚ โ”€โ”€โ–ถ โ”‚   Phase 7    โ”‚ โ”€โ”€โ–ถ โ”‚   Phase 8    โ”‚
 โ”‚ ML Math  โ”‚     โ”‚ LLM Math โ”‚     โ”‚ Production   โ”‚     โ”‚ Research     โ”‚
 โ”‚          โ”‚     โ”‚          โ”‚     โ”‚ & Safety     โ”‚     โ”‚ Frontiers    โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Phase 1 โ€” Core Math Foundations Ch. 01 โ†’ 02 โ†’ 04 โ†’ 05

The mathematical language everything else is built on.

โ‘  MATHEMATICAL FOUNDATIONS (Ch.01)
โ”œโ”€โ”€ Number Systems (โ„• โ„ค โ„š โ„ โ„‚)
โ”œโ”€โ”€ Sets & Logic
โ”œโ”€โ”€ Functions & Mappings
โ”œโ”€โ”€ ฮฃ Summation & Product Notation
โ”œโ”€โ”€ Einstein Summation & Index Notation
โ””โ”€โ”€ Proof Techniques (induction, contradiction, direct)
        โ”‚
        โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ–ผ                                          โ–ผ
โ‘ก LINEAR ALGEBRA (Ch.02 + Ch.03)            โ‘ข CALCULUS (Ch.04 + Ch.05)
โ”œโ”€โ”€ Vectors & Spaces                        โ”œโ”€โ”€ Limits & Continuity
โ”œโ”€โ”€ Matrix Operations                       โ”œโ”€โ”€ Derivatives & Differentiation
โ”œโ”€โ”€ Systems of Equations                    โ”œโ”€โ”€ Integration & Series
โ”œโ”€โ”€ Determinants & Rank                     โ”œโ”€โ”€ Partial Derivatives & Gradients
โ”œโ”€โ”€ Eigenvalues & Eigenvectors โ˜…            โ”œโ”€โ”€ Jacobians & Hessians โ˜…
โ”œโ”€โ”€ SVD โ˜…                                   โ”œโ”€โ”€ Chain Rule โ†’ Backpropagation โ˜…
โ”œโ”€โ”€ PCA                                     โ”œโ”€โ”€ Optimality Conditions
โ”œโ”€โ”€ Orthogonality & Norms                   โ””โ”€โ”€ Automatic Differentiation โ˜…
โ”œโ”€โ”€ Positive Definite Matrices
โ””โ”€โ”€ Matrix Decompositions (LU/QR/Cholesky)

Phase 2 โ€” Probabilistic Thinking Ch. 06 โ†’ 07

How to reason about uncertainty โ€” the foundation of all ML inference.

โ‘ฃ PROBABILITY & STATISTICS (Ch.06 + Ch.07)
โ”œโ”€โ”€ Random Variables & Distributions
โ”œโ”€โ”€ Joint Distributions
โ”œโ”€โ”€ Expectation & Moments
โ”œโ”€โ”€ Concentration Inequalities
โ”œโ”€โ”€ Stochastic Processes
โ”œโ”€โ”€ Markov Chains โ˜…
โ”œโ”€โ”€ Descriptive Statistics
โ”œโ”€โ”€ Estimation Theory & MLE
โ”œโ”€โ”€ Bayesian Inference โ˜…
โ”œโ”€โ”€ Hypothesis Testing
โ”œโ”€โ”€ Time Series
โ””โ”€โ”€ Regression Analysis

Phase 3 โ€” Making Models Learn Ch. 08 โ†’ 09

The algorithms that train every model and the theory behind loss functions.

โ‘ค OPTIMIZATION (Ch.08)                     โ‘ฅ INFORMATION THEORY (Ch.09)
โ”œโ”€โ”€ Convex Optimization โ˜…                  โ”œโ”€โ”€ Entropy (Shannon) โ˜…
โ”œโ”€โ”€ Gradient Descent (SGD/Mini-batch) โ˜…    โ”œโ”€โ”€ KL Divergence โ˜…
โ”œโ”€โ”€ Second-Order Methods (Newton/BFGS)     โ”œโ”€โ”€ Mutual Information โ˜…
โ”œโ”€โ”€ Constrained Optimization (KKT)         โ”œโ”€โ”€ Cross-Entropy โ˜…
โ”œโ”€โ”€ Stochastic Optimization โ˜…              โ””โ”€โ”€ Fisher Information
โ”œโ”€โ”€ Optimization Landscape
โ”œโ”€โ”€ Adaptive LR (Adam / RMSProp) โ˜…
โ”œโ”€โ”€ Regularization (L1/L2/Dropout)
โ”œโ”€โ”€ Hyperparameter Optimization
โ””โ”€โ”€ Learning Rate Schedules

Phase 4 โ€” Deeper Theory Ch. 03 โ†’ 10 โ†’ 11 โ†’ 12

Specialized math that powers specific ML architectures.

โ‘ฆ NUMERICAL METHODS (Ch.10)                โ‘ง GRAPH THEORY (Ch.11)
โ”œโ”€โ”€ Floating-Point Arithmetic              โ”œโ”€โ”€ Graph Basics & Representations
โ”œโ”€โ”€ Numerical Linear Algebra               โ”œโ”€โ”€ Graph Algorithms
โ”œโ”€โ”€ Numerical Optimization                 โ”œโ”€โ”€ Spectral Graph Theory โ˜…
โ”œโ”€โ”€ Interpolation & Approximation          โ”œโ”€โ”€ Graph Neural Networks โ˜…
โ””โ”€โ”€ Numerical Integration                  โ””โ”€โ”€ Random Graphs

โ‘จ FUNCTIONAL ANALYSIS (Ch.12)
โ”œโ”€โ”€ Normed Spaces
โ”œโ”€โ”€ Hilbert Spaces โ˜…
โ””โ”€โ”€ Kernel Methods (SVM / GP) โ˜…

Phase 5 โ€” ML Math in Practice Ch. 13 โ†’ 14

The math that directly appears inside ML models.

                        โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
                        โ•‘  ML-SPECIFIC MATH  โ•‘
                        โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
                                 โ”‚
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ–ผ                      โ–ผ                      โ–ผ
โ‘ฉ ML MATH CORE (Ch.13)   โ‘ช DEEP LEARNING (Ch.14)   โ‘ซ RL (Ch.14)
โ”œโ”€โ”€ Loss Functions โ˜…      โ”œโ”€โ”€ Neural Net Math โ˜…      โ”œโ”€โ”€ MDP (State/Action)
โ”œโ”€โ”€ Activation Fns โ˜…      โ”œโ”€โ”€ CNN & Convolution โ˜…    โ”œโ”€โ”€ Bellman Equations โ˜…
โ”œโ”€โ”€ Normalization โ˜…       โ”œโ”€โ”€ RNN & LSTM Math โ˜…      โ”œโ”€โ”€ Policy Gradient โ˜…
โ””โ”€โ”€ Sampling Methods      โ”œโ”€โ”€ Transformer โ˜…          โ”œโ”€โ”€ Value Functions โ˜…
                          โ”œโ”€โ”€ Generative (VAE/GAN)   โ””โ”€โ”€ Actor-Critic
                          โ””โ”€โ”€ Probabilistic Models

Phase 6 โ€” LLM Math Ch. 15 โ†’ 16

Everything that makes Large Language Models work under the hood.

                         โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
                         โ•‘  MATH FOR LLMs   โ•‘
                         โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
                                  โ”‚
       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ–ผ                          โ–ผ                          โ–ผ
โ‘ฌ ATTENTION & ARCH         โ‘ญ TRAINING AT SCALE       โ‘ฎ DATA PIPELINE
   (Ch.15)                    (Ch.15)                    (Ch.16)
โ”œโ”€โ”€ Tokenization            โ”œโ”€โ”€ Scaling Laws โ˜…         โ”œโ”€โ”€ Data Format Standards
โ”œโ”€โ”€ Embedding Space         โ”œโ”€โ”€ Training at Scale      โ”œโ”€โ”€ JSONL Generation
โ”œโ”€โ”€ Attention Mech โ˜…        โ”œโ”€โ”€ Efficient Attention    โ”œโ”€โ”€ Quality Checks
โ”œโ”€โ”€ Positional Enc โ˜…        โ”œโ”€โ”€ MoE & Routing          โ”œโ”€โ”€ Dataset Assembly
โ””โ”€โ”€ LM Probability โ˜…       โ”œโ”€โ”€ Quantization           โ”œโ”€โ”€ Contamination & Dedup
                            โ”œโ”€โ”€ Distillation           โ”œโ”€โ”€ Documentation
                            โ””โ”€โ”€ RAG & Retrieval        โ””โ”€โ”€ Data Mixture Optimization

Phase 7 โ€” Production & Safety Ch. 17 โ†’ 18 โ†’ 19

Ship models responsibly โ€” evaluate, align, and monitor.

โ‘ฏ EVALUATION (Ch.17)           โ‘ฐ ALIGNMENT & SAFETY (Ch.18)     โ‘ฑ PRODUCTION (Ch.19)
โ”œโ”€โ”€ Capability Benchmarks      โ”œโ”€โ”€ SFT Math โ˜…                    โ”œโ”€โ”€ Data Versioning & Lineage
โ”œโ”€โ”€ Calibration & Uncertainty  โ”œโ”€โ”€ RLHF Math โ˜…                   โ”œโ”€โ”€ Experiment Tracking
โ”œโ”€โ”€ Robustness &               โ”œโ”€โ”€ DPO / Preference Opt โ˜…        โ”œโ”€โ”€ Feature Stores &
โ”‚   Distribution Shift         โ”œโ”€โ”€ Policy & Guardrails           โ”‚   Data Contracts
โ”œโ”€โ”€ Error Analysis &           โ””โ”€โ”€ Human-in-the-Loop             โ”œโ”€โ”€ Model Serving &
โ”‚   Ablations                      & Monitoring                  โ”‚   Inference Optimization
โ””โ”€โ”€ A/B Testing &                                                โ”œโ”€โ”€ Monitoring, Drift
    Experimentation                                              โ”‚   & Retraining
                                                                 โ””โ”€โ”€ LLM Observability
                                                                     & Guardrails

Phase 8 โ€” Research Frontiers Ch. 20 โ†’ 21 โ†’ 22 โ†’ 23 โ†’ 24 โ†’ 25

Advanced theory for research-level understanding.

โ‘ฒ FOURIER & SIGNALS (Ch.20)    โ‘ณ STATISTICAL LEARNING (Ch.21)   ใ‰‘ CAUSAL INFERENCE (Ch.22)
โ”œโ”€โ”€ Fourier Series              โ”œโ”€โ”€ PAC Learning                  โ”œโ”€โ”€ Structural Causal Models
โ”œโ”€โ”€ Fourier Transform           โ”œโ”€โ”€ VC Dimension                  โ”œโ”€โ”€ Do-Calculus
โ”œโ”€โ”€ DFT & FFT                  โ”œโ”€โ”€ Bias-Variance Tradeoff        โ”œโ”€โ”€ Counterfactuals
โ”œโ”€โ”€ Convolution Theorem         โ”œโ”€โ”€ Generalization Bounds         โ””โ”€โ”€ Causal Discovery
โ””โ”€โ”€ Wavelets                    โ””โ”€โ”€ Rademacher Complexity

ใ‰’ GAME THEORY (Ch.23)          ใ‰“ MEASURE THEORY (Ch.24)         ใ‰” DIFF. GEOMETRY (Ch.25)
โ”œโ”€โ”€ Nash Equilibria             โ”œโ”€โ”€ Sigma-Algebras                โ”œโ”€โ”€ Manifolds
โ”œโ”€โ”€ Minimax Theorem             โ”œโ”€โ”€ Lebesgue Integration          โ”œโ”€โ”€ Riemannian Geometry
โ”œโ”€โ”€ Multi-Agent Systems         โ”œโ”€โ”€ Probability Measure Spaces    โ”œโ”€โ”€ Geodesics
โ””โ”€โ”€ Adversarial Game Theory     โ””โ”€โ”€ Radon-Nikodym Theorem         โ””โ”€โ”€ Optimization on Manifolds
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘                                                               โ•‘
โ•‘        โ˜…  YOU ARE NOW A MATH-FOR-AI WIZARD  โ˜…                 โ•‘
โ•‘                                                               โ•‘
โ•‘        โ˜… = Critical for ML/AI โ€” prioritize these topics       โ•‘
โ•‘                                                               โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Quick Reference โ€” Learning Order

Phase Chapters Focus
Phase 1 โ€” Core Foundations 01 โ†’ 02 โ†’ 04 โ†’ 05 Numbers, vectors, matrices, derivatives, gradients
Phase 2 โ€” Probabilistic Thinking 06 โ†’ 07 Random variables, distributions, estimation, inference
Phase 3 โ€” Making Models Learn 08 โ†’ 09 Optimization algorithms, information-theoretic losses
Phase 4 โ€” Deeper Theory 03 โ†’ 10 โ†’ 11 โ†’ 12 Advanced linear algebra, numerical methods, graphs, kernels
Phase 5 โ€” ML Math in Practice 13 โ†’ 14 Loss functions, activations, architecture-specific math
Phase 6 โ€” LLM Math 15 โ†’ 16 Attention, embeddings, scaling laws, training pipelines
Phase 7 โ€” Production & Safety 17 โ†’ 18 โ†’ 19 Evaluation, alignment (RLHF/DPO), MLOps
Phase 8 โ€” Research Frontiers 20 โ†’ 21 โ†’ 22 โ†’ 23 โ†’ 24 โ†’ 25 Fourier analysis, learning theory, causality, geometry

๐Ÿ“š Chapters

Core Mathematics

01 ยท Mathematical Foundations โ€” Number systems, sets, logic, proofs
| Topic | Description | | :--------------------------- | :---------------------------------------------------------------- | | Number Systems | Natural, integer, rational, real, and complex numbers (N Z Q R C) | | Sets & Logic | Set operations, propositional logic, quantifiers | | Functions & Mappings | Domain, range, injectivity, surjectivity, composition | | Summation & Product Notation | Sigma/Pi notation, index manipulation | | Einstein Summation | Index notation used in tensor operations | | Proof Techniques | Induction, contradiction, direct proof, contrapositive |
02 ยท Linear Algebra Basics โ€” Vectors, matrices, systems of equations
| Topic | ML Connection | | :------------------------ | :--------------------------------------- | | Vectors & Spaces | Feature representations, embeddings | | Matrix Operations | Forward propagation, transformations | | Systems of Equations | Linear regression (normal equations) | | Determinants | Change of variables in normalizing flows | | Matrix Rank | Model capacity, low-rank approximations | | Vector Spaces & Subspaces | Dimensionality, feature spaces |
03 ยท Advanced Linear Algebra โ€” Eigen decomposition, SVD, PCA
| Topic | ML Connection | | :----------------------------- | :-------------------------------------------- | | Eigenvalues & Eigenvectors | PCA, spectral clustering, stability analysis | | Singular Value Decomposition | Recommender systems, dimensionality reduction | | Principal Component Analysis | Feature extraction, data compression | | Linear Transformations | Neural network layers as transforms | | Orthogonality & Orthonormality | Gram-Schmidt, decorrelated features | | Matrix Norms | Regularization, operator bounds | | Positive Definite Matrices | Covariance matrices, kernel validity | | Matrix Decompositions | LU, QR, Cholesky โ€” efficient solvers |
04 ยท Calculus Fundamentals โ€” Limits, derivatives, integrals, series
| Topic | ML Connection | | :---------------------------- | :--------------------------------------------- | | Limits & Continuity | Convergence guarantees, activation smoothness | | Derivatives & Differentiation | Gradient computation for all parameters | | Integration | Probability densities, normalization constants | | Series & Sequences | Taylor approximations, convergence analysis |
05 ยท Multivariate Calculus โ€” Gradients, Jacobians, backpropagation
| Topic | ML Connection | | :------------------------------ | :------------------------------------------- | | Partial Derivatives & Gradients | Direction of steepest descent | | Jacobians & Hessians | Multi-output functions, second-order methods | | Chain Rule & Backpropagation | Training every neural network | | Optimality Conditions | Convergence criteria, saddle points | | Automatic Differentiation | PyTorch autograd, JAX |

Probability, Statistics & Optimization

06 ยท Probability Theory โ€” Distributions, expectations, stochastic processes
| Topic | ML Connection | | :------------------------- | :----------------------------------------------- | | Random Variables | Output uncertainty, stochastic models | | Common Distributions | Gaussian, Bernoulli, Poisson โ€” model assumptions | | Joint Distributions | Multi-variate modeling, copulas | | Expectation & Moments | Loss functions, feature statistics | | Concentration Inequalities | Generalization bounds, sample complexity | | Stochastic Processes | Time series, diffusion models | | Markov Chains | MCMC sampling, language modeling |
07 ยท Statistics โ€” Estimation, testing, Bayesian inference, regression
| Topic | ML Connection | | :--------------------- | :-------------------------------------------- | | Descriptive Statistics | EDA, feature engineering | | Estimation Theory | MLE, MAP โ€” training as estimation | | Hypothesis Testing | A/B testing, model comparison | | Bayesian Inference | Posterior updates, uncertainty quantification | | Time Series | Sequence forecasting, temporal patterns | | Regression Analysis | Baseline models, diagnostics |
08 ยท Optimization โ€” SGD, Adam, constrained optimization, regularization
| Topic | ML Connection | | :-------------------------- | :----------------------------------------- | | Convex Optimization | Global guarantees, convergence proofs | | Gradient Descent | The engine behind all training | | Second-Order Methods | Newton, BFGS โ€” faster convergence | | Constrained Optimization | Lagrange multipliers, KKT conditions | | Stochastic Optimization | SGD, mini-batch โ€” scaling to big data | | Optimization Landscape | Local minima, saddle points, loss surfaces | | Adaptive Learning Rate | Adam, RMSProp, AdaGrad | | Regularization Methods | L1/L2, Dropout, weight decay | | Hyperparameter Optimization | Grid search, Bayesian optimization | | Learning Rate Schedules | Warmup, cosine annealing, step decay |

Information Theory & Numerical Methods

09 ยท Information Theory โ€” Entropy, KL divergence, cross-entropy
| Topic | ML Connection | | :----------------- | :-------------------------------------------- | | Entropy | Decision tree splits, uncertainty measurement | | KL Divergence | VAE loss, knowledge distillation | | Mutual Information | Feature selection, InfoGAN | | Cross-Entropy | The most common classification loss | | Fisher Information | Efficient estimation, natural gradient |
10 ยท Numerical Methods โ€” Floating-point, stability, interpolation, integration
๐Ÿ“– [Chapter README](10-Numerical-Methods/README.md) | Topic | ML Connection | | :---------------------------- | :--------------------------------------------------------------------------------------- | | Floating-Point Arithmetic | Mixed precision training (FP16/BF16/FP8), loss scaling, Flash Attention numerics | | Numerical Linear Algebra | Stable solvers, iterative methods (CG/Lanczos), condition number for training | | Numerical Optimization | L-BFGS two-loop, Armijo line search, gradient checking, trust-region methods | | Interpolation & Approximation | RoPE/sinusoidal PE, KAN B-splines, Runge's phenomenon, FFT, random Fourier features | | Numerical Integration | Gaussian quadrature, Monte Carlo variance reduction, reparameterization trick (VAE ELBO) |

Specialized Mathematics

11 ยท Graph Theory โ€” Graph algorithms, spectral methods, GNNs
| Topic | ML Connection | | :-------------------- | :----------------------------------- | | Graph Basics | Social networks, molecular graphs | | Graph Representations | Adjacency/Laplacian matrices | | Graph Algorithms | Shortest path, centrality, traversal | | Spectral Graph Theory | Community detection, graph wavelets | | Graph Neural Networks | Message passing, GCN, GAT | | Random Graphs | Erdos-Renyi, network analysis |
12 ยท Functional Analysis โ€” Hilbert spaces, kernel methods
| Topic | ML Connection | | :------------- | :------------------------------------ | | Normed Spaces | Regularization theory | | Hilbert Spaces | RKHS, function space learning | | Kernel Methods | SVM, Gaussian processes, kernel trick |

ML-Specific Mathematics

13 ยท ML-Specific Math โ€” Loss functions, activations, normalization, sampling
| Topic | ML Connection | | :----------------------- | :------------------------------------------------- | | Loss Functions | MSE, cross-entropy, hinge, contrastive | | Activation Functions | ReLU, GELU, sigmoid, softmax โ€” and their gradients | | Normalization Techniques | BatchNorm, LayerNorm, RMSNorm | | Sampling Methods | MCMC, rejection sampling, importance sampling |
14 ยท Math for Specific Models โ€” NNs, CNNs, RNNs, Transformers, GANs, RL
| Topic | ML Connection | | :----------------------- | :--------------------------------------------- | | Linear Models | Regression, classification foundations | | Neural Networks | Universal approximation, backprop math | | Probabilistic Models | GMMs, HMMs, variational inference | | RNN & LSTM Math | Vanishing gradients, gating mechanisms | | Transformer Architecture | Attention is all you need โ€” the math | | Reinforcement Learning | Bellman equations, policy gradients | | Generative Models | VAEs, GANs, diffusion models | | CNN & Convolution Math | Convolution theorem, pooling, receptive fields |

LLM Mathematics

15 ยท Math for LLMs โ€” Attention, embeddings, scaling laws, inference
| Topic | ML Connection | | :------------------------------ | :------------------------------------------------- | | Tokenization Math | BPE, WordPiece โ€” information-theoretic foundations | | Embedding Space Math | Geometric properties of learned representations | | Attention Mechanism Math | Scaled dot-product, multi-head, causal masking | | Positional Encodings | Sinusoidal, RoPE, ALiBi | | Language Model Probability | Next-token prediction, perplexity | | Training at Scale | Distributed training, gradient accumulation | | Fine-Tuning Math | LoRA, adapters, parameter-efficient methods | | Scaling Laws | Chinchilla, compute-optimal training | | Efficient Attention & Inference | FlashAttention, KV-cache, speculative decoding | | Mixture of Experts & Routing | Sparse gating, load balancing | | Quantization & Distillation | INT8/INT4, knowledge distillation | | RAG Math & Retrieval | Retrieval-augmented generation | | Serving & Systems Tradeoffs | Latency, throughput, batching strategies |
16 ยท LLM Training Data Pipeline โ€” Data quality, deduplication, mixture optimization
| Topic | Description | | :--------------------------- | :------------------------------------------ | | Data Format Standards | JSONL, tokenized formats, schema validation | | JSONL Generation | Efficient serialization for training | | Quality Checks | Filtering, decontamination, toxicity | | Full Dataset Assembly | Combining and balancing data sources | | Contamination & Dedup Audits | Preventing benchmark leakage | | Documentation & Governance | Data cards, provenance tracking | | Data Mixture Optimization | Optimal domain ratios for training |

Evaluation, Safety & Production

17 ยท Evaluation & Reliability โ€” Benchmarks, calibration, A/B testing
| Topic | Description | | :----------------------------------- | :-------------------------------------- | | Capability Benchmarks | MMLU, HumanEval, evaluation methodology | | Calibration & Uncertainty | Confidence vs. accuracy alignment | | Robustness & Distribution Shift | Out-of-distribution detection | | Error Analysis & Ablations | Systematic debugging | | Online Experimentation & A/B Testing | Statistical rigor in deployment |
18 ยท Alignment & Safety โ€” SFT, RLHF, DPO, red-teaming
| Topic | Description | | :----------------------------------- | :-------------------------------------------- | | Instruction Tuning & SFT | Supervised fine-tuning mathematics | | Preference Optimization (RLHF & DPO) | Reward modeling, Bradley-Terry, DPO objective | | Red-Teaming & Safety Evaluations | Adversarial robustness testing | | Policy & Guardrails | Constitutional AI, rule-based filtering | | Human-in-the-Loop & Monitoring | Active learning, feedback loops |
19 ยท Production ML & MLOps โ€” Serving, monitoring, drift detection
| Topic | Description | | :----------------------------------------- | :--------------------------------------- | | Data Versioning & Lineage | Reproducibility at scale | | Experiment Tracking | MLflow, W&B โ€” systematic experimentation | | Feature Stores & Data Contracts | Consistent feature engineering | | Model Serving & Inference Optimization | Latency, batching, hardware | | Monitoring, Drift & Retraining | Detecting degradation | | LLM Evaluation, Observability & Guardrails | LLM-specific ops |

Advanced Theory

20 ยท Fourier Analysis & Signal Processing โ€” FFT, wavelets, convolution theorem
| Topic | ML Connection | | :------------------ | :---------------------------------------- | | Fourier Series | Periodic signal decomposition | | Fourier Transform | Frequency domain analysis | | DFT & FFT | Efficient spectral computation | | Convolution Theorem | CNNs in frequency domain | | Wavelets | Multi-resolution analysis, time-frequency |
21 ยท Statistical Learning Theory โ€” PAC learning, VC dimension, generalization
| Topic | ML Connection | | :--------------------- | :--------------------------------- | | PAC Learning | Learnability guarantees | | VC Dimension | Model complexity measurement | | Bias-Variance Tradeoff | The fundamental modeling tension | | Generalization Bounds | Why models work on unseen data | | Rademacher Complexity | Data-dependent complexity measures |
22 ยท Causal Inference โ€” SCMs, do-calculus, counterfactuals
| Topic | ML Connection | | :----------------------- | :---------------------------------- | | Structural Causal Models | Beyond correlation | | Do-Calculus | Interventional reasoning | | Counterfactuals | "What if" reasoning | | Causal Discovery | Learning causal structure from data |
23 ยท Game Theory โ€” Nash equilibria, minimax, adversarial methods
| Topic | ML Connection | | :---------------------- | :------------------------------- | | Nash Equilibria | GAN training dynamics | | Minimax Theorem | Adversarial robustness | | Multi-Agent Systems | Cooperative/competitive learning | | Adversarial Game Theory | Security and robustness |
24 ยท Measure Theory โ€” Sigma-algebras, Lebesgue integration, probability spaces
| Topic | ML Connection | | :------------------------- | :---------------------------------- | | Sigma-Algebras | Rigorous probability foundations | | Lebesgue Integration | Expectation in continuous spaces | | Probability Measure Spaces | Formal probability theory | | Radon-Nikodym Theorem | Density ratios, importance sampling |
25 ยท Differential Geometry โ€” Manifolds, Riemannian geometry, geodesics
| Topic | ML Connection | | :------------------------ | :------------------------------------------ | | Manifolds | Data lies on low-dimensional manifolds | | Riemannian Geometry | Natural gradient, information geometry | | Geodesics | Shortest paths in curved spaces | | Optimization on Manifolds | Constrained optimization on curved surfaces |

๐Ÿ“– Resources

The docs/ folder contains supplementary references:

Document Description
ML Math Map Visual guide โ€” which math is used where in ML
Notation Guide Consistent notation conventions across the course
Cheatsheet Quick-reference formula sheet
Interview Prep Common ML math interview questions with solutions
Visualization Guide Tips for building mathematical intuition visually

๐Ÿ› ๏ธ Tech Stack

Tool Purpose
Python 3.8+ Primary language
NumPy / SciPy Numerical computing
Matplotlib / Seaborn / Plotly Visualizations
SymPy Symbolic mathematics
Jupyter Lab Interactive notebooks
scikit-learn ML examples and demos

๐Ÿค Feedback & Suggestions

If you find any typos, errors in formulas, or have suggestions for new visualizations and practice exercises, please feel free to suggest improvements or submit feedback.


Built for learners, researchers, and engineers who believe understanding the math makes you a better AI practitioner.


"In God we trust. All others must bring data." โ€” W. Edwards Deming


Back to Top