Lesson overview | Previous part | Lesson overview
Matrix Norms and Condition Numbers: Appendix J: Quick Reference - Key Theorems to Summary
Appendix J: Quick Reference - Key Theorems
J.1 The Four Fundamental Theorems of Matrix Norms
Theorem 1 (Equivalence of Matrix Norms). On (finite-dimensional), all matrix norms are equivalent: for any two norms and , there exist constants such that:
Implication: Convergence (or divergence) in one norm implies convergence (or divergence) in all norms. Norms differ quantitatively, not qualitatively, in finite dimensions.
Theorem 2 (Eckart-Young). The best rank- approximation to in BOTH the Frobenius and spectral norms is the truncated SVD . The approximation errors are (spectral) and (Frobenius).
Implication: Low-rank approximation is solved optimally by SVD. This underpins PCA, LoRA, MLA, and all SVD-based compression methods.
Theorem 3 (Weyl's Inequality). Singular values are Lipschitz-1 functions of the matrix: for all .
Implication: Singular values are numerically stable - guaranteed small changes for small perturbations. Eigenvalues of non-symmetric matrices do NOT have this guarantee.
Theorem 4 (Von Neumann Trace Inequality). , with equality when share singular vector bases.
Implication: The trace inner product is bounded by singular value inner products. This is the fundamental inequality behind nuclear-spectral duality and all Holder-type bounds for matrix norms.
J.2 Key Formulas at a Glance
Norm relations for of rank :
Condition number: (for square nonsingular); each decimal digit of costs one digit of precision in double arithmetic.
Proximal operators:
- (nuclear norm -> SVT)
- (Frobenius squared -> scaling)
Dual norm pairs: spectral nuclear; (matrix 1-norm -norm); Frobenius is self-dual.
Stable rank: - a smooth proxy for rank that controls generalization bounds.
J.3 Notation Reference
Following the project notation guide (docs/NOTATION_GUIDE.md):
| Symbol | Meaning |
|---|---|
| Frobenius norm of matrix | |
| Spectral (operator 2-) norm of ; equals | |
| Nuclear (trace) norm of ; equals | |
| Matrix 1-norm (max absolute column sum) | |
| Matrix -norm (max absolute row sum) | |
| Schatten -norm; applied to singular values | |
| Ky Fan -norm; sum of top- singular values | |
| Condition number in -norm: | |
| Stable rank: | |
| -th singular value (decreasing order) | |
| Vector of all singular values | |
| Frobenius inner product: |
<- Back to Advanced Linear Algebra | Next: Linear Transformations ->
Appendix K: Further Reading
K.1 Textbooks
-
Golub & Van Loan - Matrix Computations (4th ed., 2013). The definitive numerical linear algebra reference. Chapters 2-3 cover matrix norms and condition numbers in full depth.
-
Horn & Johnson - Matrix Analysis (2nd ed., 2013). Comprehensive theoretical treatment of matrix norms, singular values, and inequalities.
-
Bhatia - Matrix Analysis (1997). Advanced treatment of matrix inequalities, Schatten norms, and majorization. Chapter IV covers unitarily invariant norms.
-
Trefethen & Bau - Numerical Linear Algebra (1997). Excellent pedagogical treatment. Lectures 4-5 cover norms; Lectures 11-12 cover condition number theory.
K.2 Foundational Papers
-
Eckart & Young (1936) - "The approximation of one matrix by another of lower rank." Psychometrika. The Eckart-Young theorem.
-
Mirsky (1960) - "Symmetric gauge functions and unitarily invariant norms." Quarterly Journal of Mathematics. Unitarily invariant norm characterization via symmetric gauge functions.
-
Candes & Recht (2009) - "Exact matrix completion via convex optimization." Foundations of Computational Mathematics. Nuclear norm for matrix recovery under incoherence.
-
Cai, Candes & Shen (2010) - "A singular value thresholding algorithm for matrix completion." SIAM Journal on Optimization. The SVT proximal algorithm.
K.3 Machine Learning Papers
-
Miyato et al. (2018) - "Spectral Normalization for Generative Adversarial Networks." ICLR 2018. Spectral norm in GAN training; power iteration for .
-
Bartlett, Foster & Telgarsky (2017) - "Spectrally-normalized margin bounds for neural networks." NeurIPS 2017. PAC-Bayes generalization bounds via spectral and Frobenius norms.
-
Gunasekar et al. (2017) - "Implicit Regularization in Matrix Factorization." NeurIPS 2017. Gradient flow on factored parameterization implicitly minimizes nuclear norm.
-
Hu et al. (2021) - "LoRA: Low-Rank Adaptation of Large Language Models." ICLR 2022. Low-rank weight adaptation; implicit nuclear norm regularization.
-
Dong et al. (2021) - "Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth." ICML 2021. Attention matrix rank analysis via spectral/nuclear norms.
-
DeepSeek-AI (2024) - "DeepSeek-V2." MLA architecture and nuclear norm-motivated KV cache compression.
K.4 Online Resources
- Gilbert Strang's Linear Algebra lectures (MIT OpenCourseWare 18.06): Lectures 29-33 cover SVD, norms, and condition numbers with worked examples.
- Numerical Linear Algebra (Trefethen, Oxford): Freely available course notes that complement the textbook above.
- Matrix Cookbook (Petersen & Pedersen, 2012): Dense reference for matrix identities and norm formulas. Freely available as a PDF.
- Convex Optimization (Boyd & Vandenberghe, Stanford): Chapter 6 covers matrix norms as convex functions; Chapter 11 covers proximal methods including SVT.
This section is part of the Math for LLMs curriculum. All notation follows docs/NOTATION_GUIDE.md. For visualization standards, see docs/VISUALIZATION_GUIDE.md.
<- Back to Advanced Linear Algebra | Next: Linear Transformations ->
Summary
Matrix norms are the central quantitative tools of matrix analysis. This section covered:
Core norm families - The five principal matrix norms (Frobenius, spectral, nuclear, matrix-1, matrix-) and the broader Schatten family that unifies them. Each norm captures a different geometric property of a linear map: the Frobenius norm measures RMS stretching; the spectral norm measures maximum stretching; the nuclear norm measures total stretching (sum of singular values).
Induced vs. non-induced - The spectral, matrix-1, and matrix- norms are induced (they arise from vector norms on input/output spaces). The Frobenius norm is not induced but is compatible. Every induced norm is submultiplicative; submultiplicativity is the key property for stability analysis.
Unitarily invariant norms - Norms that depend only on singular values (Frobenius, spectral, nuclear, Schatten, Ky Fan) have a unified theory via symmetric gauge functions. The Von Neumann trace inequality and Mirsky's theorem are the central results.
Condition number - The ratio quantifies sensitivity to perturbations. Every decimal digit of costs one digit of precision. Tikhonov regularization reduces at the cost of introducing systematic bias.
Perturbation theory - Weyl's inequality (singular values are Lipschitz-1 w.r.t. spectral norm) guarantees stability of SVD computations. The Bauer-Fike theorem shows that non-normal eigenvalues can be much less stable.
Machine learning connections - Every major modern ML technique has a matrix norm interpretation: weight decay (Frobenius), spectral normalization (spectral), LoRA (nuclear via factored form), gradient clipping (spectral or Frobenius of gradients), PAC-Bayes bounds (stable rank and Frobenius/spectral), MLA compression (Eckart-Young), and attention analysis (nuclear norm rank collapse).
The key unifying insight: matrix norms reduce infinite-dimensional questions about linear operators to finite-dimensional scalar measurements. This reduction is the mathematical move that makes analysis tractable, computation feasible, and regularization principled.
This understanding of norms as measurement instruments prepares us for the next section on Linear Transformations, where we use norms to study how maps between vector spaces can be classified, composed, and analyzed. The spectral norm of a transformation matrix is precisely its operator norm - the fundamental quantity controlling how much the transformation can stretch vectors. Every topic in the remaining curriculum - optimization convergence (Chapter 8), probabilistic models (Chapter 10), and the mathematics of specific architectures (Chapter 14) - uses matrix norms as a core tool.
The journey from abstract norm axioms to practical tools - power iteration, singular value thresholding, Tikhonov regularization, spectral normalization - illustrates how pure mathematics becomes engineering. Matrix norms are not just theoretical objects but the computational primitives of modern AI.
Matrix norms connect every part of linear algebra - the SVD gives their values, orthogonality explains their invariance, eigenvalues govern condition numbers - and every part of machine learning - norms define regularizers, bound generalization, and stabilize training. They are the language in which the theory and practice of modern AI is written.