Mathematics for AI / ML / LLM

The Complete Math Foundation You Need to Master AI

A structured, comprehensive curriculum covering 25 domains of mathematics essential for AI, Machine Learning, and Large Language Models — from foundational concepts to cutting-edge research.

Get Started · Chapters · Resources

Why This Course?

Most AI/ML learners hit the math wall — papers full of symbols that feel alien, optimization steps that seem like magic, and model architectures that assume deep mathematical fluency.

This course bridges that gap with a learn-by-doing approach:

Structured path from high school math to research-level topics
Notes → Theory → Exercises flow for every topic
Interactive Jupyter notebooks with visualizations, not just formulas
Real ML connections — every concept links to practical AI applications
Self-contained — no prerequisites beyond basic algebra

"The math you need depends on what you're building — this course helps you find exactly that."

🚀 Learning Flow

Each topic in the curriculum follows a structured 3-step learning flow designed to build deep intuition and practical skill:

📖 notes.md          → Read the concepts, formulas, and intuition
🔬 theory.ipynb      → Explore interactive python/jupyter code demonstrations
✏️ exercises.ipynb   → Solve practice problems to test your understanding

Use the left sidebar navigation to explore topics, or follow the Learning Roadmap below.

🗺️ Learning Roadmap

The curriculum covers 25 domains organized in 8 phases. Each phase builds on the previous one. Topics marked with ★ are critical for ML/AI.

  START HERE
      │
      ▼
 ┌─────────┐     ┌─────────────┐     ┌──────────┐     ┌──────────────┐
 │ Phase 1  │ ──▶ │   Phase 2   │ ──▶ │ Phase 3  │ ──▶ │   Phase 4    │
 │ Core     │     │ Probability │     │ Learning │     │ Deeper       │
 │ Math     │     │ & Stats     │     │ Engines  │     │ Theory       │
 └─────────┘     └─────────────┘     └──────────┘     └──────────────┘
                                                              │
      ┌───────────────────────────────────────────────────────┘
      ▼
 ┌──────────┐     ┌──────────┐     ┌──────────────┐     ┌──────────────┐
 │ Phase 5  │ ──▶ │ Phase 6  │ ──▶ │   Phase 7    │ ──▶ │   Phase 8    │
 │ ML Math  │     │ LLM Math │     │ Production   │     │ Research     │
 │          │     │          │     │ & Safety     │     │ Frontiers    │
 └──────────┘     └──────────┘     └──────────────┘     └──────────────┘

Phase 1 — Core Math Foundations `Ch. 01 → 02 → 04 → 05`

The mathematical language everything else is built on.

① MATHEMATICAL FOUNDATIONS (Ch.01)
├── Number Systems (ℕ ℤ ℚ ℝ ℂ)
├── Sets & Logic
├── Functions & Mappings
├── Σ Summation & Product Notation
├── Einstein Summation & Index Notation
└── Proof Techniques (induction, contradiction, direct)
        │
        ├──────────────────────────────────────────┐
        ▼                                          ▼
② LINEAR ALGEBRA (Ch.02 + Ch.03)            ③ CALCULUS (Ch.04 + Ch.05)
├── Vectors & Spaces                        ├── Limits & Continuity
├── Matrix Operations                       ├── Derivatives & Differentiation
├── Systems of Equations                    ├── Integration & Series
├── Determinants & Rank                     ├── Partial Derivatives & Gradients
├── Eigenvalues & Eigenvectors ★            ├── Jacobians & Hessians ★
├── SVD ★                                   ├── Chain Rule → Backpropagation ★
├── PCA                                     ├── Optimality Conditions
├── Orthogonality & Norms                   └── Automatic Differentiation ★
├── Positive Definite Matrices
└── Matrix Decompositions (LU/QR/Cholesky)

Phase 2 — Probabilistic Thinking `Ch. 06 → 07`

How to reason about uncertainty — the foundation of all ML inference.

④ PROBABILITY & STATISTICS (Ch.06 + Ch.07)
├── Random Variables & Distributions
├── Joint Distributions
├── Expectation & Moments
├── Concentration Inequalities
├── Stochastic Processes
├── Markov Chains ★
├── Descriptive Statistics
├── Estimation Theory & MLE
├── Bayesian Inference ★
├── Hypothesis Testing
├── Time Series
└── Regression Analysis

Phase 3 — Making Models Learn `Ch. 08 → 09`

The algorithms that train every model and the theory behind loss functions.

⑤ OPTIMIZATION (Ch.08)                     ⑥ INFORMATION THEORY (Ch.09)
├── Convex Optimization ★                  ├── Entropy (Shannon) ★
├── Gradient Descent (SGD/Mini-batch) ★    ├── KL Divergence ★
├── Second-Order Methods (Newton/BFGS)     ├── Mutual Information ★
├── Constrained Optimization (KKT)         ├── Cross-Entropy ★
├── Stochastic Optimization ★              └── Fisher Information
├── Optimization Landscape
├── Adaptive LR (Adam / RMSProp) ★
├── Regularization (L1/L2/Dropout)
├── Hyperparameter Optimization
└── Learning Rate Schedules

Phase 4 — Deeper Theory `Ch. 03 → 10 → 11 → 12`

Specialized math that powers specific ML architectures.

⑦ NUMERICAL METHODS (Ch.10)                ⑧ GRAPH THEORY (Ch.11)
├── Floating-Point Arithmetic              ├── Graph Basics & Representations
├── Numerical Linear Algebra               ├── Graph Algorithms
├── Numerical Optimization                 ├── Spectral Graph Theory ★
├── Interpolation & Approximation          ├── Graph Neural Networks ★
└── Numerical Integration                  └── Random Graphs

⑨ FUNCTIONAL ANALYSIS (Ch.12)
├── Normed Spaces
├── Hilbert Spaces ★
└── Kernel Methods (SVM / GP) ★

Phase 5 — ML Math in Practice `Ch. 13 → 14`

The math that directly appears inside ML models.

                        ╔═══════════════════╗
                        ║  ML-SPECIFIC MATH  ║
                        ╚═══════════════════╝
                                 │
          ┌──────────────────────┼──────────────────────┐
          ▼                      ▼                      ▼
⑩ ML MATH CORE (Ch.13)   ⑪ DEEP LEARNING (Ch.14)   ⑫ RL (Ch.14)
├── Loss Functions ★      ├── Neural Net Math ★      ├── MDP (State/Action)
├── Activation Fns ★      ├── CNN & Convolution ★    ├── Bellman Equations ★
├── Normalization ★       ├── RNN & LSTM Math ★      ├── Policy Gradient ★
└── Sampling Methods      ├── Transformer ★          ├── Value Functions ★
                          ├── Generative (VAE/GAN)   └── Actor-Critic
                          └── Probabilistic Models

Phase 6 — LLM Math `Ch. 15 → 16`

Everything that makes Large Language Models work under the hood.

                         ╔═════════════════╗
                         ║  MATH FOR LLMs   ║
                         ╚═════════════════╝
                                  │
       ┌──────────────────────────┼──────────────────────────┐
       ▼                          ▼                          ▼
⑬ ATTENTION & ARCH         ⑭ TRAINING AT SCALE       ⑮ DATA PIPELINE
   (Ch.15)                    (Ch.15)                    (Ch.16)
├── Tokenization            ├── Scaling Laws ★         ├── Data Format Standards
├── Embedding Space         ├── Training at Scale      ├── JSONL Generation
├── Attention Mech ★        ├── Efficient Attention    ├── Quality Checks
├── Positional Enc ★        ├── MoE & Routing          ├── Dataset Assembly
└── LM Probability ★       ├── Quantization           ├── Contamination & Dedup
                            ├── Distillation           ├── Documentation
                            └── RAG & Retrieval        └── Data Mixture Optimization

Phase 7 — Production & Safety `Ch. 17 → 18 → 19`

Ship models responsibly — evaluate, align, and monitor.

⑯ EVALUATION (Ch.17)           ⑰ ALIGNMENT & SAFETY (Ch.18)     ⑱ PRODUCTION (Ch.19)
├── Capability Benchmarks      ├── SFT Math ★                    ├── Data Versioning & Lineage
├── Calibration & Uncertainty  ├── RLHF Math ★                   ├── Experiment Tracking
├── Robustness &               ├── DPO / Preference Opt ★        ├── Feature Stores &
│   Distribution Shift         ├── Policy & Guardrails           │   Data Contracts
├── Error Analysis &           └── Human-in-the-Loop             ├── Model Serving &
│   Ablations                      & Monitoring                  │   Inference Optimization
└── A/B Testing &                                                ├── Monitoring, Drift
    Experimentation                                              │   & Retraining
                                                                 └── LLM Observability
                                                                     & Guardrails

Phase 8 — Research Frontiers `Ch. 20 → 21 → 22 → 23 → 24 → 25`

Advanced theory for research-level understanding.

⑲ FOURIER & SIGNALS (Ch.20)    ⑳ STATISTICAL LEARNING (Ch.21)   ㉑ CAUSAL INFERENCE (Ch.22)
├── Fourier Series              ├── PAC Learning                  ├── Structural Causal Models
├── Fourier Transform           ├── VC Dimension                  ├── Do-Calculus
├── DFT & FFT                  ├── Bias-Variance Tradeoff        ├── Counterfactuals
├── Convolution Theorem         ├── Generalization Bounds         └── Causal Discovery
└── Wavelets                    └── Rademacher Complexity

㉒ GAME THEORY (Ch.23)          ㉓ MEASURE THEORY (Ch.24)         ㉔ DIFF. GEOMETRY (Ch.25)
├── Nash Equilibria             ├── Sigma-Algebras                ├── Manifolds
├── Minimax Theorem             ├── Lebesgue Integration          ├── Riemannian Geometry
├── Multi-Agent Systems         ├── Probability Measure Spaces    ├── Geodesics
└── Adversarial Game Theory     └── Radon-Nikodym Theorem         └── Optimization on Manifolds

╔═══════════════════════════════════════════════════════════════╗
║                                                               ║
║        ★  YOU ARE NOW A MATH-FOR-AI WIZARD  ★                 ║
║                                                               ║
║        ★ = Critical for ML/AI — prioritize these topics       ║
║                                                               ║
╚═══════════════════════════════════════════════════════════════╝

Quick Reference — Learning Order

Phase	Chapters	Focus
Phase 1 — Core Foundations	01 → 02 → 04 → 05	Numbers, vectors, matrices, derivatives, gradients
Phase 2 — Probabilistic Thinking	06 → 07	Random variables, distributions, estimation, inference
Phase 3 — Making Models Learn	08 → 09	Optimization algorithms, information-theoretic losses
Phase 4 — Deeper Theory	03 → 10 → 11 → 12	Advanced linear algebra, numerical methods, graphs, kernels
Phase 5 — ML Math in Practice	13 → 14	Loss functions, activations, architecture-specific math
Phase 6 — LLM Math	15 → 16	Attention, embeddings, scaling laws, training pipelines
Phase 7 — Production & Safety	17 → 18 → 19	Evaluation, alignment (RLHF/DPO), MLOps
Phase 8 — Research Frontiers	20 → 21 → 22 → 23 → 24 → 25	Fourier analysis, learning theory, causality, geometry

📚 Chapters

Core Mathematics

01 · Mathematical Foundations — Number systems, sets, logic, proofs

02 · Linear Algebra Basics — Vectors, matrices, systems of equations

03 · Advanced Linear Algebra — Eigen decomposition, SVD, PCA

04 · Calculus Fundamentals — Limits, derivatives, integrals, series

05 · Multivariate Calculus — Gradients, Jacobians, backpropagation

Probability, Statistics & Optimization

06 · Probability Theory — Distributions, expectations, stochastic processes

07 · Statistics — Estimation, testing, Bayesian inference, regression

08 · Optimization — SGD, Adam, constrained optimization, regularization

Information Theory & Numerical Methods

09 · Information Theory — Entropy, KL divergence, cross-entropy

10 · Numerical Methods — Floating-point, stability, interpolation, integration

📖 [Chapter README](10-Numerical-Methods/README.md) | Topic | ML Connection | | :---------------------------- | :--------------------------------------------------------------------------------------- | | Floating-Point Arithmetic | Mixed precision training (FP16/BF16/FP8), loss scaling, Flash Attention numerics | | Numerical Linear Algebra | Stable solvers, iterative methods (CG/Lanczos), condition number for training | | Numerical Optimization | L-BFGS two-loop, Armijo line search, gradient checking, trust-region methods | | Interpolation & Approximation | RoPE/sinusoidal PE, KAN B-splines, Runge's phenomenon, FFT, random Fourier features | | Numerical Integration | Gaussian quadrature, Monte Carlo variance reduction, reparameterization trick (VAE ELBO) |

Specialized Mathematics

11 · Graph Theory — Graph algorithms, spectral methods, GNNs

12 · Functional Analysis — Hilbert spaces, kernel methods

ML-Specific Mathematics

13 · ML-Specific Math — Loss functions, activations, normalization, sampling

14 · Math for Specific Models — NNs, CNNs, RNNs, Transformers, GANs, RL

LLM Mathematics

15 · Math for LLMs — Attention, embeddings, scaling laws, inference

16 · LLM Training Data Pipeline — Data quality, deduplication, mixture optimization

Evaluation, Safety & Production

17 · Evaluation & Reliability — Benchmarks, calibration, A/B testing

18 · Alignment & Safety — SFT, RLHF, DPO, red-teaming

19 · Production ML & MLOps — Serving, monitoring, drift detection

Advanced Theory

20 · Fourier Analysis & Signal Processing — FFT, wavelets, convolution theorem

21 · Statistical Learning Theory — PAC learning, VC dimension, generalization

22 · Causal Inference — SCMs, do-calculus, counterfactuals

23 · Game Theory — Nash equilibria, minimax, adversarial methods

24 · Measure Theory — Sigma-algebras, Lebesgue integration, probability spaces

25 · Differential Geometry — Manifolds, Riemannian geometry, geodesics

📖 Resources

The docs/ folder contains supplementary references:

Document	Description
ML Math Map	Visual guide — which math is used where in ML
Notation Guide	Consistent notation conventions across the course
Cheatsheet	Quick-reference formula sheet
Interview Prep	Common ML math interview questions with solutions
Visualization Guide	Tips for building mathematical intuition visually

🛠️ Tech Stack

Tool	Purpose
Python 3.8+	Primary language
NumPy / SciPy	Numerical computing
Matplotlib / Seaborn / Plotly	Visualizations
SymPy	Symbolic mathematics
Jupyter Lab	Interactive notebooks
scikit-learn	ML examples and demos

🤝 Feedback & Suggestions

If you find any typos, errors in formulas, or have suggestions for new visualizations and practice exercises, please feel free to suggest improvements or submit feedback.

Built for learners, researchers, and engineers who believe understanding the math makes you a better AI practitioner.

"In God we trust. All others must bring data." — W. Edwards Deming