Mathematics for AI / ML / LLM
The Complete Math Foundation You Need to Master AI
A structured, comprehensive curriculum covering 25 domains of mathematics essential for AI, Machine Learning, and Large Language Models โ from foundational concepts to cutting-edge research.
Get Started ยท Chapters ยท Resources
Why This Course?
Most AI/ML learners hit the math wall โ papers full of symbols that feel alien, optimization steps that seem like magic, and model architectures that assume deep mathematical fluency.
This course bridges that gap with a learn-by-doing approach:
- Structured path from high school math to research-level topics
- Notes โ Theory โ Exercises flow for every topic
- Interactive Jupyter notebooks with visualizations, not just formulas
- Real ML connections โ every concept links to practical AI applications
- Self-contained โ no prerequisites beyond basic algebra
"The math you need depends on what you're building โ this course helps you find exactly that."
๐ Learning Flow
Each topic in the curriculum follows a structured 3-step learning flow designed to build deep intuition and practical skill:
๐ notes.md โ Read the concepts, formulas, and intuition
๐ฌ theory.ipynb โ Explore interactive python/jupyter code demonstrations
โ๏ธ exercises.ipynb โ Solve practice problems to test your understanding
Use the left sidebar navigation to explore topics, or follow the Learning Roadmap below.
๐บ๏ธ Learning Roadmap
The curriculum covers 25 domains organized in 8 phases. Each phase builds on the previous one. Topics marked with โ are critical for ML/AI.
START HERE
โ
โผ
โโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Phase 1 โ โโโถ โ Phase 2 โ โโโถ โ Phase 3 โ โโโถ โ Phase 4 โ
โ Core โ โ Probability โ โ Learning โ โ Deeper โ
โ Math โ โ & Stats โ โ Engines โ โ Theory โ
โโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Phase 5 โ โโโถ โ Phase 6 โ โโโถ โ Phase 7 โ โโโถ โ Phase 8 โ
โ ML Math โ โ LLM Math โ โ Production โ โ Research โ
โ โ โ โ โ & Safety โ โ Frontiers โ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
Phase 1 โ Core Math Foundations Ch. 01 โ 02 โ 04 โ 05
The mathematical language everything else is built on.
โ MATHEMATICAL FOUNDATIONS (Ch.01)
โโโ Number Systems (โ โค โ โ โ)
โโโ Sets & Logic
โโโ Functions & Mappings
โโโ ฮฃ Summation & Product Notation
โโโ Einstein Summation & Index Notation
โโโ Proof Techniques (induction, contradiction, direct)
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ
โก LINEAR ALGEBRA (Ch.02 + Ch.03) โข CALCULUS (Ch.04 + Ch.05)
โโโ Vectors & Spaces โโโ Limits & Continuity
โโโ Matrix Operations โโโ Derivatives & Differentiation
โโโ Systems of Equations โโโ Integration & Series
โโโ Determinants & Rank โโโ Partial Derivatives & Gradients
โโโ Eigenvalues & Eigenvectors โ
โโโ Jacobians & Hessians โ
โโโ SVD โ
โโโ Chain Rule โ Backpropagation โ
โโโ PCA โโโ Optimality Conditions
โโโ Orthogonality & Norms โโโ Automatic Differentiation โ
โโโ Positive Definite Matrices
โโโ Matrix Decompositions (LU/QR/Cholesky)
Phase 2 โ Probabilistic Thinking Ch. 06 โ 07
How to reason about uncertainty โ the foundation of all ML inference.
โฃ PROBABILITY & STATISTICS (Ch.06 + Ch.07)
โโโ Random Variables & Distributions
โโโ Joint Distributions
โโโ Expectation & Moments
โโโ Concentration Inequalities
โโโ Stochastic Processes
โโโ Markov Chains โ
โโโ Descriptive Statistics
โโโ Estimation Theory & MLE
โโโ Bayesian Inference โ
โโโ Hypothesis Testing
โโโ Time Series
โโโ Regression Analysis
Phase 3 โ Making Models Learn Ch. 08 โ 09
The algorithms that train every model and the theory behind loss functions.
โค OPTIMIZATION (Ch.08) โฅ INFORMATION THEORY (Ch.09)
โโโ Convex Optimization โ
โโโ Entropy (Shannon) โ
โโโ Gradient Descent (SGD/Mini-batch) โ
โโโ KL Divergence โ
โโโ Second-Order Methods (Newton/BFGS) โโโ Mutual Information โ
โโโ Constrained Optimization (KKT) โโโ Cross-Entropy โ
โโโ Stochastic Optimization โ
โโโ Fisher Information
โโโ Optimization Landscape
โโโ Adaptive LR (Adam / RMSProp) โ
โโโ Regularization (L1/L2/Dropout)
โโโ Hyperparameter Optimization
โโโ Learning Rate Schedules
Phase 4 โ Deeper Theory Ch. 03 โ 10 โ 11 โ 12
Specialized math that powers specific ML architectures.
โฆ NUMERICAL METHODS (Ch.10) โง GRAPH THEORY (Ch.11)
โโโ Floating-Point Arithmetic โโโ Graph Basics & Representations
โโโ Numerical Linear Algebra โโโ Graph Algorithms
โโโ Numerical Optimization โโโ Spectral Graph Theory โ
โโโ Interpolation & Approximation โโโ Graph Neural Networks โ
โโโ Numerical Integration โโโ Random Graphs
โจ FUNCTIONAL ANALYSIS (Ch.12)
โโโ Normed Spaces
โโโ Hilbert Spaces โ
โโโ Kernel Methods (SVM / GP) โ
Phase 5 โ ML Math in Practice Ch. 13 โ 14
The math that directly appears inside ML models.
โโโโโโโโโโโโโโโโโโโโโ
โ ML-SPECIFIC MATH โ
โโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ โผ
โฉ ML MATH CORE (Ch.13) โช DEEP LEARNING (Ch.14) โซ RL (Ch.14)
โโโ Loss Functions โ
โโโ Neural Net Math โ
โโโ MDP (State/Action)
โโโ Activation Fns โ
โโโ CNN & Convolution โ
โโโ Bellman Equations โ
โโโ Normalization โ
โโโ RNN & LSTM Math โ
โโโ Policy Gradient โ
โโโ Sampling Methods โโโ Transformer โ
โโโ Value Functions โ
โโโ Generative (VAE/GAN) โโโ Actor-Critic
โโโ Probabilistic Models
Phase 6 โ LLM Math Ch. 15 โ 16
Everything that makes Large Language Models work under the hood.
โโโโโโโโโโโโโโโโโโโ
โ MATH FOR LLMs โ
โโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ โผ
โฌ ATTENTION & ARCH โญ TRAINING AT SCALE โฎ DATA PIPELINE
(Ch.15) (Ch.15) (Ch.16)
โโโ Tokenization โโโ Scaling Laws โ
โโโ Data Format Standards
โโโ Embedding Space โโโ Training at Scale โโโ JSONL Generation
โโโ Attention Mech โ
โโโ Efficient Attention โโโ Quality Checks
โโโ Positional Enc โ
โโโ MoE & Routing โโโ Dataset Assembly
โโโ LM Probability โ
โโโ Quantization โโโ Contamination & Dedup
โโโ Distillation โโโ Documentation
โโโ RAG & Retrieval โโโ Data Mixture Optimization
Phase 7 โ Production & Safety Ch. 17 โ 18 โ 19
Ship models responsibly โ evaluate, align, and monitor.
โฏ EVALUATION (Ch.17) โฐ ALIGNMENT & SAFETY (Ch.18) โฑ PRODUCTION (Ch.19)
โโโ Capability Benchmarks โโโ SFT Math โ
โโโ Data Versioning & Lineage
โโโ Calibration & Uncertainty โโโ RLHF Math โ
โโโ Experiment Tracking
โโโ Robustness & โโโ DPO / Preference Opt โ
โโโ Feature Stores &
โ Distribution Shift โโโ Policy & Guardrails โ Data Contracts
โโโ Error Analysis & โโโ Human-in-the-Loop โโโ Model Serving &
โ Ablations & Monitoring โ Inference Optimization
โโโ A/B Testing & โโโ Monitoring, Drift
Experimentation โ & Retraining
โโโ LLM Observability
& Guardrails
Phase 8 โ Research Frontiers Ch. 20 โ 21 โ 22 โ 23 โ 24 โ 25
Advanced theory for research-level understanding.
โฒ FOURIER & SIGNALS (Ch.20) โณ STATISTICAL LEARNING (Ch.21) ใ CAUSAL INFERENCE (Ch.22)
โโโ Fourier Series โโโ PAC Learning โโโ Structural Causal Models
โโโ Fourier Transform โโโ VC Dimension โโโ Do-Calculus
โโโ DFT & FFT โโโ Bias-Variance Tradeoff โโโ Counterfactuals
โโโ Convolution Theorem โโโ Generalization Bounds โโโ Causal Discovery
โโโ Wavelets โโโ Rademacher Complexity
ใ GAME THEORY (Ch.23) ใ MEASURE THEORY (Ch.24) ใ DIFF. GEOMETRY (Ch.25)
โโโ Nash Equilibria โโโ Sigma-Algebras โโโ Manifolds
โโโ Minimax Theorem โโโ Lebesgue Integration โโโ Riemannian Geometry
โโโ Multi-Agent Systems โโโ Probability Measure Spaces โโโ Geodesics
โโโ Adversarial Game Theory โโโ Radon-Nikodym Theorem โโโ Optimization on Manifolds
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ โ
YOU ARE NOW A MATH-FOR-AI WIZARD โ
โ
โ โ
โ โ
= Critical for ML/AI โ prioritize these topics โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Quick Reference โ Learning Order
| Phase | Chapters | Focus |
|---|---|---|
| Phase 1 โ Core Foundations | 01 โ 02 โ 04 โ 05 | Numbers, vectors, matrices, derivatives, gradients |
| Phase 2 โ Probabilistic Thinking | 06 โ 07 | Random variables, distributions, estimation, inference |
| Phase 3 โ Making Models Learn | 08 โ 09 | Optimization algorithms, information-theoretic losses |
| Phase 4 โ Deeper Theory | 03 โ 10 โ 11 โ 12 | Advanced linear algebra, numerical methods, graphs, kernels |
| Phase 5 โ ML Math in Practice | 13 โ 14 | Loss functions, activations, architecture-specific math |
| Phase 6 โ LLM Math | 15 โ 16 | Attention, embeddings, scaling laws, training pipelines |
| Phase 7 โ Production & Safety | 17 โ 18 โ 19 | Evaluation, alignment (RLHF/DPO), MLOps |
| Phase 8 โ Research Frontiers | 20 โ 21 โ 22 โ 23 โ 24 โ 25 | Fourier analysis, learning theory, causality, geometry |
๐ Chapters
Core Mathematics
01 ยท Mathematical Foundations โ Number systems, sets, logic, proofs
| Topic | Description | | :--------------------------- | :---------------------------------------------------------------- | | Number Systems | Natural, integer, rational, real, and complex numbers (N Z Q R C) | | Sets & Logic | Set operations, propositional logic, quantifiers | | Functions & Mappings | Domain, range, injectivity, surjectivity, composition | | Summation & Product Notation | Sigma/Pi notation, index manipulation | | Einstein Summation | Index notation used in tensor operations | | Proof Techniques | Induction, contradiction, direct proof, contrapositive |
02 ยท Linear Algebra Basics โ Vectors, matrices, systems of equations
| Topic | ML Connection | | :------------------------ | :--------------------------------------- | | Vectors & Spaces | Feature representations, embeddings | | Matrix Operations | Forward propagation, transformations | | Systems of Equations | Linear regression (normal equations) | | Determinants | Change of variables in normalizing flows | | Matrix Rank | Model capacity, low-rank approximations | | Vector Spaces & Subspaces | Dimensionality, feature spaces |
03 ยท Advanced Linear Algebra โ Eigen decomposition, SVD, PCA
| Topic | ML Connection | | :----------------------------- | :-------------------------------------------- | | Eigenvalues & Eigenvectors | PCA, spectral clustering, stability analysis | | Singular Value Decomposition | Recommender systems, dimensionality reduction | | Principal Component Analysis | Feature extraction, data compression | | Linear Transformations | Neural network layers as transforms | | Orthogonality & Orthonormality | Gram-Schmidt, decorrelated features | | Matrix Norms | Regularization, operator bounds | | Positive Definite Matrices | Covariance matrices, kernel validity | | Matrix Decompositions | LU, QR, Cholesky โ efficient solvers |
04 ยท Calculus Fundamentals โ Limits, derivatives, integrals, series
| Topic | ML Connection | | :---------------------------- | :--------------------------------------------- | | Limits & Continuity | Convergence guarantees, activation smoothness | | Derivatives & Differentiation | Gradient computation for all parameters | | Integration | Probability densities, normalization constants | | Series & Sequences | Taylor approximations, convergence analysis |
05 ยท Multivariate Calculus โ Gradients, Jacobians, backpropagation
| Topic | ML Connection | | :------------------------------ | :------------------------------------------- | | Partial Derivatives & Gradients | Direction of steepest descent | | Jacobians & Hessians | Multi-output functions, second-order methods | | Chain Rule & Backpropagation | Training every neural network | | Optimality Conditions | Convergence criteria, saddle points | | Automatic Differentiation | PyTorch autograd, JAX |
Probability, Statistics & Optimization
06 ยท Probability Theory โ Distributions, expectations, stochastic processes
| Topic | ML Connection | | :------------------------- | :----------------------------------------------- | | Random Variables | Output uncertainty, stochastic models | | Common Distributions | Gaussian, Bernoulli, Poisson โ model assumptions | | Joint Distributions | Multi-variate modeling, copulas | | Expectation & Moments | Loss functions, feature statistics | | Concentration Inequalities | Generalization bounds, sample complexity | | Stochastic Processes | Time series, diffusion models | | Markov Chains | MCMC sampling, language modeling |
07 ยท Statistics โ Estimation, testing, Bayesian inference, regression
| Topic | ML Connection | | :--------------------- | :-------------------------------------------- | | Descriptive Statistics | EDA, feature engineering | | Estimation Theory | MLE, MAP โ training as estimation | | Hypothesis Testing | A/B testing, model comparison | | Bayesian Inference | Posterior updates, uncertainty quantification | | Time Series | Sequence forecasting, temporal patterns | | Regression Analysis | Baseline models, diagnostics |
08 ยท Optimization โ SGD, Adam, constrained optimization, regularization
| Topic | ML Connection | | :-------------------------- | :----------------------------------------- | | Convex Optimization | Global guarantees, convergence proofs | | Gradient Descent | The engine behind all training | | Second-Order Methods | Newton, BFGS โ faster convergence | | Constrained Optimization | Lagrange multipliers, KKT conditions | | Stochastic Optimization | SGD, mini-batch โ scaling to big data | | Optimization Landscape | Local minima, saddle points, loss surfaces | | Adaptive Learning Rate | Adam, RMSProp, AdaGrad | | Regularization Methods | L1/L2, Dropout, weight decay | | Hyperparameter Optimization | Grid search, Bayesian optimization | | Learning Rate Schedules | Warmup, cosine annealing, step decay |
Information Theory & Numerical Methods
09 ยท Information Theory โ Entropy, KL divergence, cross-entropy
| Topic | ML Connection | | :----------------- | :-------------------------------------------- | | Entropy | Decision tree splits, uncertainty measurement | | KL Divergence | VAE loss, knowledge distillation | | Mutual Information | Feature selection, InfoGAN | | Cross-Entropy | The most common classification loss | | Fisher Information | Efficient estimation, natural gradient |
10 ยท Numerical Methods โ Floating-point, stability, interpolation, integration
๐ [Chapter README](10-Numerical-Methods/README.md) | Topic | ML Connection | | :---------------------------- | :--------------------------------------------------------------------------------------- | | Floating-Point Arithmetic | Mixed precision training (FP16/BF16/FP8), loss scaling, Flash Attention numerics | | Numerical Linear Algebra | Stable solvers, iterative methods (CG/Lanczos), condition number for training | | Numerical Optimization | L-BFGS two-loop, Armijo line search, gradient checking, trust-region methods | | Interpolation & Approximation | RoPE/sinusoidal PE, KAN B-splines, Runge's phenomenon, FFT, random Fourier features | | Numerical Integration | Gaussian quadrature, Monte Carlo variance reduction, reparameterization trick (VAE ELBO) |
Specialized Mathematics
11 ยท Graph Theory โ Graph algorithms, spectral methods, GNNs
| Topic | ML Connection | | :-------------------- | :----------------------------------- | | Graph Basics | Social networks, molecular graphs | | Graph Representations | Adjacency/Laplacian matrices | | Graph Algorithms | Shortest path, centrality, traversal | | Spectral Graph Theory | Community detection, graph wavelets | | Graph Neural Networks | Message passing, GCN, GAT | | Random Graphs | Erdos-Renyi, network analysis |
12 ยท Functional Analysis โ Hilbert spaces, kernel methods
| Topic | ML Connection | | :------------- | :------------------------------------ | | Normed Spaces | Regularization theory | | Hilbert Spaces | RKHS, function space learning | | Kernel Methods | SVM, Gaussian processes, kernel trick |
ML-Specific Mathematics
13 ยท ML-Specific Math โ Loss functions, activations, normalization, sampling
| Topic | ML Connection | | :----------------------- | :------------------------------------------------- | | Loss Functions | MSE, cross-entropy, hinge, contrastive | | Activation Functions | ReLU, GELU, sigmoid, softmax โ and their gradients | | Normalization Techniques | BatchNorm, LayerNorm, RMSNorm | | Sampling Methods | MCMC, rejection sampling, importance sampling |
14 ยท Math for Specific Models โ NNs, CNNs, RNNs, Transformers, GANs, RL
| Topic | ML Connection | | :----------------------- | :--------------------------------------------- | | Linear Models | Regression, classification foundations | | Neural Networks | Universal approximation, backprop math | | Probabilistic Models | GMMs, HMMs, variational inference | | RNN & LSTM Math | Vanishing gradients, gating mechanisms | | Transformer Architecture | Attention is all you need โ the math | | Reinforcement Learning | Bellman equations, policy gradients | | Generative Models | VAEs, GANs, diffusion models | | CNN & Convolution Math | Convolution theorem, pooling, receptive fields |
LLM Mathematics
15 ยท Math for LLMs โ Attention, embeddings, scaling laws, inference
| Topic | ML Connection | | :------------------------------ | :------------------------------------------------- | | Tokenization Math | BPE, WordPiece โ information-theoretic foundations | | Embedding Space Math | Geometric properties of learned representations | | Attention Mechanism Math | Scaled dot-product, multi-head, causal masking | | Positional Encodings | Sinusoidal, RoPE, ALiBi | | Language Model Probability | Next-token prediction, perplexity | | Training at Scale | Distributed training, gradient accumulation | | Fine-Tuning Math | LoRA, adapters, parameter-efficient methods | | Scaling Laws | Chinchilla, compute-optimal training | | Efficient Attention & Inference | FlashAttention, KV-cache, speculative decoding | | Mixture of Experts & Routing | Sparse gating, load balancing | | Quantization & Distillation | INT8/INT4, knowledge distillation | | RAG Math & Retrieval | Retrieval-augmented generation | | Serving & Systems Tradeoffs | Latency, throughput, batching strategies |
16 ยท LLM Training Data Pipeline โ Data quality, deduplication, mixture optimization
| Topic | Description | | :--------------------------- | :------------------------------------------ | | Data Format Standards | JSONL, tokenized formats, schema validation | | JSONL Generation | Efficient serialization for training | | Quality Checks | Filtering, decontamination, toxicity | | Full Dataset Assembly | Combining and balancing data sources | | Contamination & Dedup Audits | Preventing benchmark leakage | | Documentation & Governance | Data cards, provenance tracking | | Data Mixture Optimization | Optimal domain ratios for training |
Evaluation, Safety & Production
17 ยท Evaluation & Reliability โ Benchmarks, calibration, A/B testing
| Topic | Description | | :----------------------------------- | :-------------------------------------- | | Capability Benchmarks | MMLU, HumanEval, evaluation methodology | | Calibration & Uncertainty | Confidence vs. accuracy alignment | | Robustness & Distribution Shift | Out-of-distribution detection | | Error Analysis & Ablations | Systematic debugging | | Online Experimentation & A/B Testing | Statistical rigor in deployment |
18 ยท Alignment & Safety โ SFT, RLHF, DPO, red-teaming
| Topic | Description | | :----------------------------------- | :-------------------------------------------- | | Instruction Tuning & SFT | Supervised fine-tuning mathematics | | Preference Optimization (RLHF & DPO) | Reward modeling, Bradley-Terry, DPO objective | | Red-Teaming & Safety Evaluations | Adversarial robustness testing | | Policy & Guardrails | Constitutional AI, rule-based filtering | | Human-in-the-Loop & Monitoring | Active learning, feedback loops |
19 ยท Production ML & MLOps โ Serving, monitoring, drift detection
| Topic | Description | | :----------------------------------------- | :--------------------------------------- | | Data Versioning & Lineage | Reproducibility at scale | | Experiment Tracking | MLflow, W&B โ systematic experimentation | | Feature Stores & Data Contracts | Consistent feature engineering | | Model Serving & Inference Optimization | Latency, batching, hardware | | Monitoring, Drift & Retraining | Detecting degradation | | LLM Evaluation, Observability & Guardrails | LLM-specific ops |
Advanced Theory
20 ยท Fourier Analysis & Signal Processing โ FFT, wavelets, convolution theorem
| Topic | ML Connection | | :------------------ | :---------------------------------------- | | Fourier Series | Periodic signal decomposition | | Fourier Transform | Frequency domain analysis | | DFT & FFT | Efficient spectral computation | | Convolution Theorem | CNNs in frequency domain | | Wavelets | Multi-resolution analysis, time-frequency |
21 ยท Statistical Learning Theory โ PAC learning, VC dimension, generalization
| Topic | ML Connection | | :--------------------- | :--------------------------------- | | PAC Learning | Learnability guarantees | | VC Dimension | Model complexity measurement | | Bias-Variance Tradeoff | The fundamental modeling tension | | Generalization Bounds | Why models work on unseen data | | Rademacher Complexity | Data-dependent complexity measures |
22 ยท Causal Inference โ SCMs, do-calculus, counterfactuals
| Topic | ML Connection | | :----------------------- | :---------------------------------- | | Structural Causal Models | Beyond correlation | | Do-Calculus | Interventional reasoning | | Counterfactuals | "What if" reasoning | | Causal Discovery | Learning causal structure from data |
23 ยท Game Theory โ Nash equilibria, minimax, adversarial methods
| Topic | ML Connection | | :---------------------- | :------------------------------- | | Nash Equilibria | GAN training dynamics | | Minimax Theorem | Adversarial robustness | | Multi-Agent Systems | Cooperative/competitive learning | | Adversarial Game Theory | Security and robustness |
24 ยท Measure Theory โ Sigma-algebras, Lebesgue integration, probability spaces
| Topic | ML Connection | | :------------------------- | :---------------------------------- | | Sigma-Algebras | Rigorous probability foundations | | Lebesgue Integration | Expectation in continuous spaces | | Probability Measure Spaces | Formal probability theory | | Radon-Nikodym Theorem | Density ratios, importance sampling |
25 ยท Differential Geometry โ Manifolds, Riemannian geometry, geodesics
| Topic | ML Connection | | :------------------------ | :------------------------------------------ | | Manifolds | Data lies on low-dimensional manifolds | | Riemannian Geometry | Natural gradient, information geometry | | Geodesics | Shortest paths in curved spaces | | Optimization on Manifolds | Constrained optimization on curved surfaces |
๐ Resources
The docs/ folder contains supplementary references:
| Document | Description |
|---|---|
| ML Math Map | Visual guide โ which math is used where in ML |
| Notation Guide | Consistent notation conventions across the course |
| Cheatsheet | Quick-reference formula sheet |
| Interview Prep | Common ML math interview questions with solutions |
| Visualization Guide | Tips for building mathematical intuition visually |
๐ ๏ธ Tech Stack
| Tool | Purpose |
|---|---|
| Python 3.8+ | Primary language |
| NumPy / SciPy | Numerical computing |
| Matplotlib / Seaborn / Plotly | Visualizations |
| SymPy | Symbolic mathematics |
| Jupyter Lab | Interactive notebooks |
| scikit-learn | ML examples and demos |
๐ค Feedback & Suggestions
If you find any typos, errors in formulas, or have suggestions for new visualizations and practice exercises, please feel free to suggest improvements or submit feedback.
Built for learners, researchers, and engineers who believe understanding the math makes you a better AI practitioner.
"In God we trust. All others must bring data." โ W. Edwards Deming