Theory NotebookMath for LLMs

Eigenvalues and Eigenvectors

Advanced Linear Algebra / Eigenvalues and Eigenvectors

Run notebook
Theory Notebook

Theory Notebook

Converted from theory.ipynb for web reading.

Eigenvalues and Eigenvectors — Theory Notebook

"The eigenvalues of a matrix are not just numbers — they are the heartbeat of the linear map, the frequencies at which it resonates, the rates at which it remembers and forgets."

Interactive demonstrations covering: geometric intuition, characteristic polynomial, diagonalisation, spectral theorem, Jordan form, SVD connection, PCA, gradient descent, Hessian spectrum, attention eigenstructure, graph Laplacians, spectral normalisation.

Code cell 2

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

try:
    import seaborn as sns
    sns.set_theme(style="whitegrid", palette="colorblind")
    HAS_SNS = True
except ImportError:
    plt.style.use("seaborn-v0_8-whitegrid")
    HAS_SNS = False

mpl.rcParams.update({
    "figure.figsize":    (10, 6),
    "figure.dpi":         120,
    "font.size":           13,
    "axes.titlesize":      15,
    "axes.labelsize":      13,
    "xtick.labelsize":     11,
    "ytick.labelsize":     11,
    "legend.fontsize":     11,
    "legend.framealpha":   0.85,
    "lines.linewidth":      2.0,
    "axes.spines.top":     False,
    "axes.spines.right":   False,
    "savefig.bbox":       "tight",
    "savefig.dpi":         150,
})
np.random.seed(42)
print("Plot setup complete.")

Code cell 3

import numpy as np
import numpy.linalg as la
from scipy import linalg as sla

try:
    import matplotlib.pyplot as plt
    import matplotlib
    plt.style.use('seaborn-v0_8-whitegrid')
    plt.rcParams['figure.figsize'] = [10, 6]
    plt.rcParams['font.size'] = 12
    HAS_MPL = True
except ImportError:
    HAS_MPL = False

try:
    import seaborn as sns
    HAS_SNS = True
except ImportError:
    HAS_SNS = False

np.set_printoptions(precision=6, suppress=True)
np.random.seed(42)
print('Setup complete. NumPy:', np.__version__)

1. Intuition — Geometric Visualisation

An eigenvector of AA is a direction preserved by the map: Av=λvA\mathbf{v} = \lambda\mathbf{v}. We visualise what a 2x2 matrix does to every unit vector, and identify the eigenvectors as the directions that land on the same ray.

Code cell 5

# === 1.1 Geometric Action of a 2x2 Matrix ===

A = np.array([[3., 1.], [1., 3.]])
eigvals, eigvecs = la.eigh(A)  # symmetric so eigh is exact

print('Matrix A:')
print(A)
print(f'\nEigenvalues: {eigvals}')
print(f'Eigenvector 1: {eigvecs[:,0]}')
print(f'Eigenvector 2: {eigvecs[:,1]}')

# Verify Av = lambda*v
for i in range(2):
    lhs = A @ eigvecs[:, i]
    rhs = eigvals[i] * eigvecs[:, i]
    ok = np.allclose(lhs, rhs)
    print(f'PASS - Av = lambda*v for eigenvalue {eigvals[i]:.4f}' if ok else 'FAIL')

if HAS_MPL:
    theta = np.linspace(0, 2*np.pi, 300)
    circle = np.vstack([np.cos(theta), np.sin(theta)])
    ellipse = A @ circle

    fig, axes = plt.subplots(1, 2, figsize=(12, 5))

    # Left: unit circle with eigenvectors
    ax = axes[0]
    ax.plot(circle[0], circle[1], 'b-', alpha=0.5, label='Unit circle')
    colors = ['red', 'green']
    for i in range(2):
        v = eigvecs[:, i]
        ax.annotate('', xy=v, xytext=(0,0),
                    arrowprops=dict(arrowstyle='->', color=colors[i], lw=2))
        ax.annotate('', xy=-v, xytext=(0,0),
                    arrowprops=dict(arrowstyle='->', color=colors[i], lw=2, linestyle='dashed'))
        ax.text(v[0]*1.1, v[1]*1.1, f'v{i+1}', fontsize=12, color=colors[i])
    ax.set_xlim(-1.5, 1.5); ax.set_ylim(-1.5, 1.5)
    ax.set_aspect('equal'); ax.set_title('Input: unit circle + eigenvectors')
    ax.legend()

    # Right: ellipse (image of unit circle) with scaled eigenvectors
    ax = axes[1]
    ax.plot(ellipse[0], ellipse[1], 'b-', alpha=0.5, label='Image A(circle)')
    for i in range(2):
        av = eigvals[i] * eigvecs[:, i]
        ax.annotate('', xy=av, xytext=(0,0),
                    arrowprops=dict(arrowstyle='->', color=colors[i], lw=2))
        ax.text(av[0]*1.05, av[1]*1.05, f'lambda{i+1}*v{i+1}', fontsize=10, color=colors[i])
    ax.set_aspect('equal'); ax.set_title('Output: ellipse (semi-axes = |eigenvalues|)')
    ax.legend()

    plt.suptitle('Eigenvalues = semi-axes of the output ellipse', fontsize=14)
    plt.tight_layout()
    plt.show()
    print('Plot: eigenvectors are the axes of the output ellipse')

2. Formal Definitions

2.1–2.2 The Eigenvalue Equation and Characteristic Polynomial

λ\lambda is an eigenvalue iff det(λIA)=0\det(\lambda I - A) = 0. We compute characteristic polynomials for 2x2 and 3x3 matrices.

Code cell 7

# === 2.2 Characteristic Polynomial ===

# 2x2 example
A2 = np.array([[4., 2.], [1., 3.]])
tr_A = np.trace(A2)
det_A = la.det(A2)
disc = tr_A**2 - 4*det_A

lam1 = (tr_A + np.sqrt(disc)) / 2
lam2 = (tr_A - np.sqrt(disc)) / 2

print('2x2 Matrix:')
print(A2)
print(f'tr(A) = {tr_A},  det(A) = {det_A},  discriminant = {disc:.4f}')
print(f'Eigenvalues: lambda1 = {lam1:.6f}, lambda2 = {lam2:.6f}')
print(f'Sum check:     lambda1 + lambda2 = {lam1+lam2:.6f} (should be tr={tr_A})')
print(f'Product check: lambda1 * lambda2 = {lam1*lam2:.6f} (should be det={det_A})')

# Verify with numpy
evals_np = la.eigvals(A2)
evals_np_sorted = np.sort(evals_np.real)[::-1]
our_sorted = np.sort([lam1, lam2])[::-1]
ok = np.allclose(evals_np_sorted, our_sorted)
print(f'\nPASS - matches numpy eigenvalues' if ok else 'FAIL - mismatch with numpy')

# 3x3 upper triangular (eigenvalues = diagonal entries)
A3 = np.array([[2., 1., 0.], [0., 2., 1.], [0., 0., 3.]])
print(f'\n3x3 triangular matrix eigenvalues: {np.sort(la.eigvals(A3).real)[::-1]}')
print(f'Diagonal entries (= eigenvalues):  {np.diag(A3)}')
ok3 = np.allclose(np.sort(la.eigvals(A3).real)[::-1], [3., 2., 2.])
print(f'PASS - triangular: eigenvalues = diagonal' if ok3 else 'FAIL')

2.3 Algebraic vs Geometric Multiplicity

For A=(210021003)A = \begin{pmatrix}2&1&0\\0&2&1\\0&0&3\end{pmatrix}: eigenvalue λ=2\lambda=2 has AM=2 but GM=1 (defective). We verify by checking the rank of (A2I)(A - 2I).

Code cell 9

# === 2.3 Algebraic vs Geometric Multiplicity ===

A = np.array([[2., 1., 0.], [0., 2., 1.], [0., 0., 3.]])

for lam, am in [(2.0, 2), (3.0, 1)]:
    shift = A - lam * np.eye(3)
    rnk = np.linalg.matrix_rank(shift)
    gm = 3 - rnk  # nullity = n - rank
    defective = 'DEFECTIVE' if gm < am else 'non-defective'
    print(f'lambda={lam}: AM={am}, rank(A-lI)={rnk}, GM={gm}  --> {defective}')

print('\nEigenvectors from numpy:')
evals, evecs = la.eig(A)
for i, (ev, vec) in enumerate(zip(evals, evecs.T)):
    residual = la.norm(A @ vec - ev * vec)
    print(f'  lambda={ev.real:.2f}, v={vec.real}, residual={residual:.2e}')

3. Computing Eigenvalues and Eigenvectors

3.6 Power Iteration

Power iteration converges to the dominant eigenvector at rate λ2/λ1|\lambda_2/\lambda_1| per step.

Code cell 11

# === 3.6 Power Iteration ===

np.random.seed(42)
n = 6
# Build symmetric PD matrix with known eigenvalues
Q, _ = la.qr(np.random.randn(n, n))
true_evals = np.array([10., 5., 3., 2., 1., 0.5])
A = Q @ np.diag(true_evals) @ Q.T

# Power iteration
x = np.random.randn(n); x /= la.norm(x)
errors = []
lam_estimates = []

for k in range(40):
    y = A @ x
    x_new = y / la.norm(y)
    lam_est = x_new @ A @ x_new  # Rayleigh quotient
    errors.append(abs(lam_est - true_evals[0]))
    lam_estimates.append(lam_est)
    x = x_new

theoretical_rate = true_evals[1] / true_evals[0]  # |lambda2/lambda1|
print(f'True dominant eigenvalue:    {true_evals[0]}')
print(f'Power iteration estimate:    {lam_estimates[-1]:.8f}')
print(f'Final error:                 {errors[-1]:.2e}')
print(f'Theoretical convergence rate: {theoretical_rate:.4f}')

if HAS_MPL:
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    axes[0].semilogy(errors, 'b-o', markersize=4)
    axes[0].semilogy(range(40), [errors[0] * theoretical_rate**k for k in range(40)],
                     'r--', label=f'Theoretical rate={theoretical_rate:.2f}')
    axes[0].set_xlabel('Iteration'); axes[0].set_ylabel('|error|')
    axes[0].set_title('Power Iteration Convergence'); axes[0].legend()

    axes[1].plot(lam_estimates, 'g-o', markersize=4)
    axes[1].axhline(true_evals[0], color='red', linestyle='--', label='True value')
    axes[1].set_xlabel('Iteration'); axes[1].set_ylabel('lambda estimate')
    axes[1].set_title('Rayleigh Quotient Convergence'); axes[1].legend()

    plt.tight_layout(); plt.show()

4. Properties of Eigenvalues

4.1 Trace/Determinant, 4.4 Rayleigh Quotient, 4.6 Gershgorin Circles

Code cell 13

# === 4.1 Trace and Determinant Relations ===

A = np.array([[4., 2., 1.], [2., 5., 3.], [1., 3., 6.]])
evals = la.eigvalsh(A)  # symmetric, all real

print('Matrix A:')
print(A)
print(f'\nEigenvalues: {evals}')
print(f'Sum of eigenvalues: {evals.sum():.6f}')
print(f'Trace of A:         {np.trace(A):.6f}')
print(f'PASS trace = sum(lambda)' if np.isclose(evals.sum(), np.trace(A)) else 'FAIL')
print(f'\nProduct of eigenvalues: {evals.prod():.6f}')
print(f'det(A):                 {la.det(A):.6f}')
print(f'PASS det = prod(lambda)' if np.isclose(evals.prod(), la.det(A)) else 'FAIL')

Code cell 14

# === 4.4 Rayleigh Quotient ===

A = np.array([[4., 2.], [2., 1.]])
evals, evecs = la.eigh(A)
lam_min, lam_max = evals[0], evals[1]
print(f'Eigenvalues: {evals}')
print(f'lambda_min = {lam_min:.4f}, lambda_max = {lam_max:.4f}')

test_vectors = [
    ('min eigenvec', evecs[:, 0]),
    ('max eigenvec', evecs[:, 1]),
    ('(1,0)',        np.array([1., 0.])),
    ('(1,1)/sqrt2',  np.array([1., 1.]) / np.sqrt(2)),
    ('random',       np.random.randn(2)),
]

print('\nRayleigh quotient R(A, x) = xT A x / xT x:')
for name, x in test_vectors:
    x = x / la.norm(x)
    R = x @ A @ x
    bounded = lam_min <= R + 1e-10 and R <= lam_max + 1e-10
    print(f'  {name:15s}: R = {R:.4f}  in [{lam_min:.4f}, {lam_max:.4f}]? {"YES" if bounded else "NO"}')

Code cell 15

# === 4.6 Gershgorin Circle Theorem ===

A = np.array([[5., 0.5, 0.2],
              [0.3, 3., 0.4],
              [0.1, 0.6, 8.]])

evals = la.eigvals(A)
centres = np.diag(A).real
radii = np.array([sum(abs(A[i, j]) for j in range(3) if j != i) for i in range(3)])

print('Gershgorin discs (centre, radius):')
for i, (c, r) in enumerate(zip(centres, radii)):
    print(f'  D{i+1}: centre={c:.2f}, radius={r:.2f}, disc=[{c-r:.2f}, {c+r:.2f}]')

print('\nActual eigenvalues:')
for ev in sorted(evals.real):
    print(f'  lambda = {ev:.6f}')

if HAS_MPL:
    fig, ax = plt.subplots(figsize=(8, 6))
    colors = ['blue', 'green', 'orange']
    for i, (c, r, col) in enumerate(zip(centres, radii, colors)):
        circle = plt.Circle((c, 0), r, fill=False, color=col, linewidth=2,
                            label=f'D{i+1}: c={c}, r={r:.1f}')
        ax.add_patch(circle)
    for ev in evals:
        ax.plot(ev.real, ev.imag, 'rx', markersize=12, markeredgewidth=3)
    ax.set_xlim(0, 10); ax.set_ylim(-4, 4)
    ax.set_aspect('equal'); ax.legend()
    ax.set_title('Gershgorin Circles (x = actual eigenvalues)')
    ax.set_xlabel('Real'); ax.set_ylabel('Imaginary')
    plt.tight_layout(); plt.show()
    print('All eigenvalues (x) lie inside union of Gershgorin discs')

5. Diagonalisation

A=VΛV1A = V\Lambda V^{-1}. We construct VV and Λ\Lambda, verify the factorisation, and use it to compute matrix powers efficiently.

Code cell 17

# === 5.1 Diagonalisation Construction and Verification ===

A = np.array([[3., 1.], [1., 3.]])
evals, evecs = la.eig(A)

V = evecs            # columns = eigenvectors
Lam = np.diag(evals.real)
V_inv = la.inv(V)

A_reconstructed = V @ Lam @ V_inv
print('A original:')
print(A)
print('\nA = V Lambda V^-1:')
print(A_reconstructed.real)
print(f'\nPASS - reconstruction error: {la.norm(A - A_reconstructed.real):.2e}'
      if np.allclose(A, A_reconstructed.real) else 'FAIL')

# Symmetric so V is orthogonal: V^-1 = V^T
evals_s, evecs_s = la.eigh(A)  # symmetric version
print(f'\nFor symmetric A: V^T V = I?')
print(evecs_s.T @ evecs_s)
print(f'PASS - V is orthogonal' if np.allclose(evecs_s.T @ evecs_s, np.eye(2)) else 'FAIL')

Code cell 18

# === 5.3 Matrix Powers via Eigendecomposition ===

A = np.array([[3., 1.], [1., 3.]])
evals, V = la.eigh(A)
V_inv = V.T  # orthogonal

k = 20
# Fast: O(n^2) via eigendecomposition
Ak_fast = V @ np.diag(evals**k) @ V_inv

# Reference: direct computation
Ak_ref = la.matrix_power(A, k)

print(f'A^{k} via eigendecomposition:')
print(Ak_fast)
print(f'\nA^{k} via matrix_power (reference):')
print(Ak_ref)
print(f'\nPASS - match to {la.norm(Ak_fast - Ak_ref):.2e} error'
      if np.allclose(Ak_fast, Ak_ref) else 'FAIL')

# Fibonacci via eigendecomposition
F = np.array([[1., 1.], [1., 0.]])
evals_f, V_f = la.eig(F)
phi = max(evals_f.real)
print(f'\nFibonacci dominant eigenvalue (golden ratio): {phi:.8f}')
print(f'(1 + sqrt(5)) / 2                           : {(1 + np.sqrt(5))/2:.8f}')

5.4 Matrix Functions

f(A)=Vf(Λ)V1f(A) = V f(\Lambda) V^{-1}: apply ff to each eigenvalue. We compute matrix exponential and log-determinant.

Code cell 20

# === 5.4 Matrix Functions via Eigendecomposition ===

from scipy.linalg import expm, logm

A = np.array([[2., 1.], [1., 2.]])
evals, V = la.eigh(A)

# Matrix exponential via eigendecomposition
expA_eigen = V @ np.diag(np.exp(evals)) @ V.T
expA_scipy = expm(A)
print('exp(A) via eigendecomposition:')
print(expA_eigen)
print('exp(A) via scipy.linalg.expm:')
print(expA_scipy)
print(f'PASS - match' if np.allclose(expA_eigen, expA_scipy) else 'FAIL')

# Log-determinant: log det(A) = sum(log eigenvalues)
logdet_eigen = np.sum(np.log(evals))
logdet_direct = np.log(la.det(A))
print(f'\nlog det(A) via eigenvalues: {logdet_eigen:.8f}')
print(f'log det(A) directly:        {logdet_direct:.8f}')
print(f'PASS - log det' if np.isclose(logdet_eigen, logdet_direct) else 'FAIL')

# Condition number
kappa = max(evals) / min(evals)
print(f'\nCondition number kappa2(A) = lambda_max/lambda_min = {kappa:.4f}')

6. The Spectral Theorem

A=QΛQA = Q\Lambda Q^\top for symmetric AA. We verify all spectral theorem properties and visualise the spectral decomposition.

Code cell 22

# === 6.1-6.3 Spectral Theorem and Decomposition ===

A = np.array([[4., 2., 1.], [2., 5., 3.], [1., 3., 6.]])
evals, Q = la.eigh(A)  # eigh for symmetric: guaranteed real evals, orthogonal Q

print('Symmetric matrix A:')
print(A)
print(f'\nEigenvalues (all real): {evals}')
print(f'All eigenvalues real?  {np.all(np.isreal(evals))}')

# Verify orthonormality of eigenvectors
QtQ = Q.T @ Q
print(f'\nQ^T Q (should be I):')
print(QtQ)
print(f'PASS - Q is orthogonal' if np.allclose(QtQ, np.eye(3)) else 'FAIL')

# Spectral decomposition: A = sum lambda_i q_i q_i^T
A_reconstructed = sum(evals[i] * np.outer(Q[:, i], Q[:, i]) for i in range(3))
print(f'\nSpectral decomposition error: {la.norm(A - A_reconstructed):.2e}')
print(f'PASS - A = sum lambda_i q_i q_i^T' if np.allclose(A, A_reconstructed) else 'FAIL')

# Check each rank-1 term
print('\nRank-1 terms (lambda_i * q_i q_i^T):')
for i in range(3):
    term = evals[i] * np.outer(Q[:, i], Q[:, i])
    print(f'  Term {i+1}: lambda={evals[i]:.4f}, rank={np.linalg.matrix_rank(term)}')

Code cell 23

# === 6.4 Positive Definite Matrices via Eigenvalues ===

matrices = {
    'PD:  [[2,1],[1,2]]':     np.array([[2., 1.], [1., 2.]]),
    'PSD: [[1,1],[1,1]]':     np.array([[1., 1.], [1., 1.]]),
    'Indef: [[2,-3],[-3,1]]': np.array([[2., -3.], [-3., 1.]]),
}

for name, M in matrices.items():
    evals = la.eigvalsh(M)
    if all(ev > 1e-10 for ev in evals):
        classification = 'Positive Definite'
    elif all(ev >= -1e-10 for ev in evals):
        classification = 'Positive Semidefinite'
    elif all(ev < -1e-10 for ev in evals):
        classification = 'Negative Definite'
    else:
        classification = 'Indefinite (has + and - eigenvalues)'
    kappa = max(abs(evals)) / (min(abs(evals)) + 1e-15)
    print(f'{name}:\n  evals={evals}, class={classification}, kappa={kappa:.2f}\n')

7. Jordan Normal Form

For defective matrices, diagonalisation fails. We compute Jordan blocks and verify powers of Jordan blocks show polynomial growth.

Code cell 25

# === 7.2-7.4 Jordan Blocks and Their Powers ===

# Defective matrix: A = [[2,1,0],[0,2,1],[0,0,3]]
A = np.array([[2., 1., 0.], [0., 2., 1.], [0., 0., 3.]])

# Attempt diagonalisation via eig
evals, evecs = la.eig(A)
print('Eigenvalues:', evals.real)
print('Eigenvectors (columns of V):')
print(evecs.real)

# Check defectiveness: rank of (A - 2I)
A_shift = A - 2*np.eye(3)
rank_shift = np.linalg.matrix_rank(A_shift)
gm = 3 - rank_shift
print(f'\nFor lambda=2: rank(A-2I)={rank_shift}, GM={gm}, AM=2  --> DEFECTIVE: {gm < 2}')

# Jordan block J_2(1): powers show polynomial growth at spectral radius = 1
print('\nPowers of Jordan block J_2(1) = [[1,1],[0,1]]:')
J = np.array([[1., 1.], [0., 1.]])
for t in [1, 5, 10, 20]:
    Jt = la.matrix_power(J, t)
    print(f'  J^{t:2d} = {Jt}  (off-diagonal = {t}, grows linearly)')

# Compare: block with |lambda|=0.99 (stable)
print('\nPowers of J_2(0.99) = [[0.99,1],[0,0.99]]: entries decay (0.99^t * t)')
J_stable = np.array([[0.99, 1.], [0., 0.99]])
for t in [1, 10, 50, 100]:
    Jt = la.matrix_power(J_stable, t)
    print(f'  J^{t:3d}: off-diag = {Jt[0,1]:.4f} (= {t} * 0.99^{t} = {t * 0.99**t:.4f})')

8. Singular Value Decomposition

A=UΣVA = U\Sigma V^\top. Singular values = eigenvalues of AA\sqrt{\text{eigenvalues of } A^\top A}. We compute SVD, verify the connection to eigenvalues, and demonstrate the Eckart-Young theorem.

Code cell 27

# === 8.1-8.3 SVD and Connection to Eigenvalues ===

np.random.seed(0)
A = np.random.randn(4, 3)  # rectangular

U, s, Vt = la.svd(A, full_matrices=False)
print(f'A shape: {A.shape}')
print(f'U shape: {U.shape}, s shape: {s.shape}, Vt shape: {Vt.shape}')
print(f'Singular values: {s}')

# Verify: singular values = sqrt(eigenvalues of ATA)
AtA = A.T @ A
evals_AtA = la.eigvalsh(AtA)
sv_from_eig = np.sqrt(np.maximum(0, evals_AtA))[::-1]
print(f'\nsqrt(eigenvalues of ATA): {sv_from_eig}')
print(f'SVD singular values:       {s}')
print(f'PASS - match' if np.allclose(s, sv_from_eig) else 'FAIL')

# Verify reconstruction
A_recon = U @ np.diag(s) @ Vt
print(f'\nReconstruction error: {la.norm(A - A_recon):.2e}')
print(f'PASS - A = U Sigma Vt' if np.allclose(A, A_recon) else 'FAIL')

Code cell 28

# === 8.3 Eckart-Young: Best Rank-r Approximation ===

np.random.seed(5)
m, n = 20, 15
# Create a rank-3 matrix plus noise
true_rank = 3
A_clean = np.random.randn(m, true_rank) @ np.random.randn(true_rank, n)
A_noisy = A_clean + 0.3 * np.random.randn(m, n)

U, s, Vt = la.svd(A_noisy, full_matrices=False)

print('Singular values of A_noisy:')
print(s[:8].round(3), '...')

errors = []
for r in range(1, min(m, n)):
    A_r = U[:, :r] @ np.diag(s[:r]) @ Vt[:r, :]
    errors.append(la.norm(A_noisy - A_r, 'fro'))

print(f'\nFrobenius errors by rank:')
for r in [1, 2, 3, 4, 5, 10]:
    print(f'  rank-{r} approx error: {errors[r-1]:.4f}')

# Verify Eckart-Young: error = sqrt(sum sigma_{r+1}^2 + ...)
r = 3
ey_bound = np.sqrt(np.sum(s[r:]**2))
print(f'\nEckart-Young bound for rank-{r}: {ey_bound:.6f}')
print(f'Actual error:                    {errors[r-1]:.6f}')
print(f'PASS - Eckart-Young tight' if np.isclose(ey_bound, errors[r-1]) else 'FAIL')

if HAS_MPL:
    plt.figure(figsize=(8, 4))
    plt.plot(range(1, min(m,n)), errors, 'b-o', markersize=5)
    plt.axvline(true_rank, color='red', linestyle='--', label=f'True rank={true_rank}')
    plt.xlabel('Rank r'); plt.ylabel('||A - A_r||_F')
    plt.title('Eckart-Young: Truncated SVD Approximation Error')
    plt.legend(); plt.tight_layout(); plt.show()

9. Eigendecomposition in AI Applications

This section connects eigenvalues to the most important ML algorithms and phenomena: PCA, gradient descent convergence, the neural network Hessian spectrum, attention eigenstructure, graph neural networks, spectral normalisation, RNN stability, and diffusion models.

9.1 Principal Component Analysis (PCA)

PCA decomposes the sample covariance matrix C=X~X~/(n1)C = \tilde{X}^\top\tilde{X}/(n-1) using the spectral theorem C=QΛQC = Q\Lambda Q^\top. The eigenvectors qi\mathbf{q}_i are the principal components; the eigenvalues λi\lambda_i are the variances captured. We implement PCA from scratch using np.linalg.eigh and compare with sklearn.

Code cell 31

# === 9.1 PCA from Scratch via Eigendecomposition ===

np.random.seed(0)
n, d = 200, 3
# Correlated 3D data: embedded in a 2D subspace + noise
true_components = np.array([[0.7071, 0.7071, 0.0],
                             [0.0,    0.0,    1.0]])
scores = np.random.randn(n, 2) * np.array([3.0, 1.5])
X = scores @ true_components + 0.1 * np.random.randn(n, d)

# 1. Centre
X_c = X - X.mean(axis=0)

# 2. Covariance matrix
C = (X_c.T @ X_c) / (n - 1)

# 3. Eigendecomposition (eigh: symmetric, ascending order)
eigvals, eigvecs = la.eigh(C)
# eigh returns ascending — reverse for descending
eigvals = eigvals[::-1]
eigvecs = eigvecs[:, ::-1]

print('Covariance matrix:')
print(np.round(C, 4))
print(f'\nEigenvalues (variances): {eigvals.round(4)}')
print(f'Explained variance ratio: {(eigvals / eigvals.sum()).round(4)}')
print(f'Top-2 components explain {100*(eigvals[:2].sum()/eigvals.sum()):.1f}% of variance')

# 4. Project onto top-2 PCs
Z = X_c @ eigvecs[:, :2]  # (n, 2) scores
print(f'\nProjected data shape: {Z.shape}')

# 5. Reconstruction error
X_recon = Z @ eigvecs[:, :2].T
recon_err = la.norm(X_c - X_recon, 'fro')**2 / la.norm(X_c, 'fro')**2
print(f'Relative reconstruction error (rank-2): {recon_err:.4f}')
print(f'Eckart-Young bound: sum of dropped eigenvalues / total = {eigvals[2]/eigvals.sum():.4f}')

if HAS_MPL:
    fig, axes = plt.subplots(1, 3, figsize=(15, 4))
    # Scatter: PC1 vs PC2
    axes[0].scatter(Z[:, 0], Z[:, 1], alpha=0.4, s=15)
    axes[0].set_xlabel('PC1'); axes[0].set_ylabel('PC2')
    axes[0].set_title('PCA Projection (top 2 PCs)')
    # Eigenvalue spectrum
    axes[1].bar(range(1, d+1), eigvals, color='steelblue')
    axes[1].set_xlabel('Component'); axes[1].set_ylabel('Eigenvalue (variance)')
    axes[1].set_title('Eigenvalue Spectrum (Scree Plot)')
    # Cumulative explained variance
    cum_var = np.cumsum(eigvals) / eigvals.sum()
    axes[2].plot(range(1, d+1), cum_var, 'o-', color='steelblue')
    axes[2].axhline(0.95, color='red', linestyle='--', label='95% threshold')
    axes[2].set_xlabel('Components retained'); axes[2].set_ylabel('Cumulative variance')
    axes[2].set_title('Cumulative Explained Variance'); axes[2].legend()
    plt.tight_layout(); plt.show()
    print('Plot: PCA scree plot and projection')

9.2 Eigenvalues and Gradient Descent Convergence

For a quadratic loss L(θ)=12θHθbθ\mathcal{L}(\boldsymbol{\theta}) = \frac{1}{2}\boldsymbol{\theta}^\top H \boldsymbol{\theta} - \mathbf{b}^\top \boldsymbol{\theta}, GD with step η\eta converges iff η<2/λmax\eta < 2/\lambda_{\max}. The error in the ii-th eigendirection decays as (1ηλi)t(1 - \eta\lambda_i)^t. The condition number κ=λmax/λmin\kappa = \lambda_{\max}/\lambda_{\min} governs how badly conditioned the problem is.

Code cell 33

# === 9.2 Gradient Descent on a Quadratic Loss ===

# Build a 5x5 PSD Hessian with known eigenvalues
eigvals_H = np.array([100., 50., 10., 1., 0.1])
Q, _ = la.qr(np.random.randn(5, 5))
H = Q @ np.diag(eigvals_H) @ Q.T
b = np.random.randn(5)
theta_star = la.solve(H, b)  # optimal solution

kappa = eigvals_H[0] / eigvals_H[-1]
print(f'Hessian eigenvalues: {eigvals_H}')
print(f'Condition number kappa = {kappa:.1f}')
print(f'Optimal step: eta = 2/(lambda_max + lambda_min) = {2/(eigvals_H[0]+eigvals_H[-1]):.4f}')

def run_gd(H, b, theta_star, eta, T=200):
    theta = np.zeros(5)
    errors = []
    for _ in range(T):
        grad = H @ theta - b
        theta = theta - eta * grad
        errors.append(la.norm(theta - theta_star))
    return np.array(errors)

eta_opt = 2.0 / (eigvals_H[0] + eigvals_H[-1])  # optimal step
eta_large = 1.9 / eigvals_H[0]                   # near stability boundary
eta_small = 0.5 / eigvals_H[0]                   # conservative step

errors_opt   = run_gd(H, b, theta_star, eta_opt)
errors_large = run_gd(H, b, theta_star, eta_large)
errors_small = run_gd(H, b, theta_star, eta_small)

print(f'\nErrors at T=200 — optimal: {errors_opt[-1]:.2e}, large: {errors_large[-1]:.2e}, small: {errors_small[-1]:.2e}')

# Theoretical convergence rate for optimal step
rho = (kappa - 1) / (kappa + 1)
print(f'Theoretical convergence rate rho = (kappa-1)/(kappa+1) = {rho:.4f}')
print(f'Error should decay as rho^t ~ {rho**200:.2e} after 200 steps')

if HAS_MPL:
    plt.figure(figsize=(10, 4))
    plt.semilogy(errors_opt,   label=f'eta_opt={eta_opt:.4f}')
    plt.semilogy(errors_large, label=f'eta_large={eta_large:.4f}', linestyle='--')
    plt.semilogy(errors_small, label=f'eta_small={eta_small:.4f}', linestyle=':')
    # Theoretical rate
    t = np.arange(200)
    plt.semilogy(t, errors_opt[0] * rho**t, 'k--', alpha=0.4, label=f'Theory: rho^t, rho={rho:.3f}')
    plt.xlabel('Iteration'); plt.ylabel('||theta - theta*|| (log scale)')
    plt.title(f'GD Convergence on Quadratic Loss (kappa={kappa:.0f})')
    plt.legend(); plt.tight_layout(); plt.show()
    print('Plot: GD convergence for different step sizes')

9.3 Neural Network Hessian Spectrum

The Hessian of a neural network's loss has a distinctive spectrum: a bulk near zero (flat, slow directions) plus a few large outlier eigenvalues (sharp, fast directions). We simulate this structure and demonstrate its implications for training dynamics.

Code cell 35

# === 9.3 Simulated Neural Network Hessian Spectrum ===

# Simulate a realistic Hessian spectrum:
# - Bulk: many near-zero eigenvalues (flat landscape)
# - Outliers: a few large eigenvalues (sharp directions)
np.random.seed(42)
n_params = 200

# Bulk: exponentially decaying eigenvalues (Marchenko-Pastur-like)
bulk_size = 180
bulk_eigs = np.exp(-np.linspace(0, 5, bulk_size)) * 0.5 + np.random.exponential(0.02, bulk_size)

# Outliers: a few large eigenvalues (task-relevant directions)
outlier_eigs = np.array([85., 42., 18., 7., 3., 1.5, 0.8, 0.4, 0.2, 0.15])

all_eigs = np.concatenate([bulk_eigs, outlier_eigs])
all_eigs = np.sort(all_eigs)[::-1]

print(f'Total parameters (proxy): {len(all_eigs)}')
print(f'Largest eigenvalue (lambda_max): {all_eigs[0]:.2f}')
print(f'Median eigenvalue: {np.median(all_eigs):.4f}')
print(f'Smallest eigenvalue: {all_eigs[-1]:.4f}')
print(f'Condition number: {all_eigs[0]/all_eigs[all_eigs>1e-10].min():.0f}')

# Optimal learning rate from Hessian spectrum
eta_max = 2.0 / all_eigs[0]
eta_opt = 2.0 / (all_eigs[0] + all_eigs[all_eigs > 1e-10].min())
print(f'\nMax stable lr (< 2/lambda_max): {eta_max:.4f}')
print(f'Optimal lr (Chebyshev): {eta_opt:.4f}')

# Convergence time for bottom 10% of eigenvalues
threshold = 1e-3
slow_eigs = all_eigs[all_eigs < all_eigs[0] * 0.1]
print(f'\nSlowest eigenvalue: {slow_eigs[-1]:.4f}')
print(f'Steps to reduce slow-direction error by 1/e: ~{1/(eta_opt * slow_eigs[-1]):.0f}')
print(f'Steps to reduce fast-direction error by 1/e: ~{1/(eta_opt * all_eigs[0]):.0f}')

if HAS_MPL:
    fig, axes = plt.subplots(1, 2, figsize=(14, 4))
    # Full spectrum
    axes[0].semilogy(all_eigs, '.', markersize=4)
    axes[0].set_xlabel('Eigenvalue index'); axes[0].set_ylabel('Eigenvalue (log scale)')
    axes[0].set_title('Simulated Hessian Spectrum')
    axes[0].axhline(eta_max**(-1)*2, color='red', linestyle='--', alpha=0.5, label=f'lambda_max={all_eigs[0]:.0f}')
    axes[0].legend()
    # Histogram (bulk)
    axes[1].hist(all_eigs[all_eigs < 5], bins=50, color='steelblue', edgecolor='white')
    axes[1].set_xlabel('Eigenvalue'); axes[1].set_ylabel('Count')
    axes[1].set_title('Bulk Eigenvalue Distribution (lambda < 5)')
    plt.tight_layout(); plt.show()
    print('Plot: Hessian eigenvalue spectrum (bulk + outliers)')

9.4 Attention Eigenstructure

A softmax attention weight matrix SS is row-stochastic. By the Perron-Frobenius theorem, λ1=1\lambda_1 = 1 with eigenvector 1/n\mathbf{1}/\sqrt{n}. We analyse the eigenspectrum of attention patterns and distinguish different attention head behaviours.

Code cell 37

# === 9.4 Attention Matrix Eigenstructure ===

from scipy.special import softmax as sp_softmax

def make_attention(n_seq, pattern='uniform'):
    """Create a row-stochastic attention matrix with different patterns."""
    if pattern == 'uniform':
        return np.ones((n_seq, n_seq)) / n_seq
    elif pattern == 'diagonal':
        return np.eye(n_seq)
    elif pattern == 'causal_sharp':
        # Causal (lower-triangular) with sharp focus on previous token
        raw = np.zeros((n_seq, n_seq))
        for i in range(n_seq):
            raw[i, max(0, i-1):i+1] = 10.0  # attend strongly to prev 2
        # Mask future
        mask = np.tril(np.ones((n_seq, n_seq)))
        raw = raw * mask
        # Apply softmax row-wise
        for i in range(n_seq):
            if raw[i].sum() > 0:
                raw[i] = sp_softmax(raw[i] - raw[i].max())
        return raw
    elif pattern == 'induction':
        # Induction-like: attend to specific prior position
        raw = np.eye(n_seq) * 0.1
        # Shift: attend 2 positions back
        shift = 2
        for i in range(shift, n_seq):
            raw[i, i-shift] = 5.0
        for i in range(n_seq):
            raw[i] = sp_softmax(raw[i])
        return raw

n_seq = 12
patterns = ['uniform', 'diagonal', 'causal_sharp', 'induction']

print('Attention Pattern Eigenvalue Analysis')
print('='*60)
for pat in patterns:
    S = make_attention(n_seq, pat)
    # Row-stochastic: all row sums = 1
    assert np.allclose(S.sum(axis=1), 1.0), f'Row sum check failed for {pat}'
    evals = np.sort(np.abs(la.eigvals(S)))[::-1]
    print(f'\n{pat.upper()}')
    print(f'  lambda_1 (dominant): {evals[0]:.6f}  (should be 1.0 by Perron-Frobenius)')
    print(f'  lambda_2:            {evals[1]:.6f}')
    print(f'  Spectral gap:        {evals[0]-evals[1]:.6f}')
    print(f'  |lambda_n|:          {evals[-1]:.6f}')

print('\nPASS - All patterns satisfy Perron-Frobenius (dominant eigenvalue = 1)')

9.5 Graph Laplacian and GNNs

The graph Laplacian L=DAL = D - A (where DD is degree matrix, AA adjacency) is symmetric PSD. Its eigenvalues 0=λ1λ2λn0 = \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n encode graph connectivity: the number of zero eigenvalues equals the number of connected components; λ2\lambda_2 (the Fiedler value) measures how well-connected the graph is. Graph convolution in GNNs is filtering in the Laplacian eigenspace.

Code cell 39

# === 9.5 Graph Laplacian Eigenvalues ===

def build_laplacian(adj):
    """Build graph Laplacian L = D - A from adjacency matrix."""
    D = np.diag(adj.sum(axis=1))
    return D - adj

# Example 1: Path graph P_6
n = 6
A_path = np.zeros((n, n))
for i in range(n-1):
    A_path[i, i+1] = A_path[i+1, i] = 1.0

# Example 2: Two triangles (disconnected -> 2 components)
A_disc = np.zeros((6, 6))
for i, j in [(0,1),(1,2),(0,2),(3,4),(4,5),(3,5)]:
    A_disc[i,j] = A_disc[j,i] = 1.0

# Example 3: Complete graph K_6
A_complete = np.ones((n, n)) - np.eye(n)

for name, A in [('Path P_6', A_path), ('2 Triangles (disconnected)', A_disc), ('Complete K_6', A_complete)]:
    L = build_laplacian(A)
    evals = np.sort(la.eigh(L)[0])
    n_components = np.sum(evals < 1e-8)
    fiedler = evals[n_components] if len(evals) > n_components else 0.0
    print(f'{name}:')
    print(f'  Eigenvalues: {evals.round(4)}')
    print(f'  Zero eigenvalues (components): {n_components}')
    print(f'  Fiedler value (lambda_2): {fiedler:.4f}')
    print(f'  lambda_max: {evals[-1]:.4f}')
    print()

# Normalised Laplacian (used in GNNs)
def norm_laplacian(adj):
    d = adj.sum(axis=1)
    d_inv_sqrt = np.where(d > 0, d**(-0.5), 0.0)
    D_inv_sqrt = np.diag(d_inv_sqrt)
    return np.eye(len(d)) - D_inv_sqrt @ adj @ D_inv_sqrt

L_norm = norm_laplacian(A_path)
evals_norm = np.sort(la.eigh(L_norm)[0])
print(f'Normalised Laplacian (Path P_6) eigenvalues: {evals_norm.round(4)}')
print(f'Range [0, 2]: min={evals_norm[0]:.4f}, max={evals_norm[-1]:.4f}')
print('PASS - all normalised Laplacian eigenvalues in [0, 2]' if np.all(evals_norm >= -1e-10) and np.all(evals_norm <= 2+1e-10) else 'FAIL')

9.6 Spectral Normalisation of Weight Matrices

Spectral normalisation divides each weight matrix WW by its spectral norm (largest singular value), enforcing a Lipschitz-1 constraint. The largest singular value is σmax(W)=λmax(WW)\sigma_{\max}(W) = \sqrt{\lambda_{\max}(W^\top W)}. We implement spectral normalisation and demonstrate how it bounds the Lipschitz constant of a network.

Code cell 41

# === 9.6 Spectral Normalisation ===

np.random.seed(7)

# Random weight matrix (simulating a linear layer)
W = np.random.randn(64, 32) * 0.5

def spectral_norm(W):
    """Compute spectral norm = largest singular value."""
    return la.svd(W, compute_uv=False)[0]

def power_iteration_spectral(W, n_iters=50):
    """Estimate spectral norm via power iteration on W^T W."""
    m, n = W.shape
    v = np.random.randn(n); v /= la.norm(v)
    for _ in range(n_iters):
        u = W @ v; u /= la.norm(u)
        v = W.T @ u; v /= la.norm(v)
    return float(u @ W @ v)

sigma_exact = spectral_norm(W)
sigma_pi = power_iteration_spectral(W)

print(f'Weight matrix shape: {W.shape}')
print(f'Spectral norm (SVD):            {sigma_exact:.6f}')
print(f'Spectral norm (power iteration): {sigma_pi:.6f}')
print(f'Relative error: {abs(sigma_exact - sigma_pi)/sigma_exact:.2e}')

# Spectrally normalised matrix
W_sn = W / sigma_exact
sigma_after = spectral_norm(W_sn)
print(f'\nAfter spectral normalisation:')
print(f'  Spectral norm: {sigma_after:.6f}  (should be 1.0)')
print(f'  All singular values <= 1: {np.all(la.svd(W_sn, compute_uv=False) <= 1 + 1e-10)}')

# Lipschitz constant of a 3-layer spectral-normalised network
layer_sigmas = []
for shape in [(32, 64), (16, 32), (4, 16)]:
    W_l = np.random.randn(*shape)
    W_l_sn = W_l / spectral_norm(W_l)
    layer_sigmas.append(spectral_norm(W_l_sn))

lip_bound = np.prod(layer_sigmas)
print(f'\n3-layer spectral-normalised network:')
print(f'  Layer spectral norms: {[f"{s:.4f}" for s in layer_sigmas]}')
print(f'  Lipschitz bound (product): {lip_bound:.4f}  (should be <= 1.0)')
print('PASS' if lip_bound <= 1.0 + 1e-10 else 'FAIL')

9.7 Eigenvalues in Recurrent Networks

A linear RNN ht=Wht1+Uxt\mathbf{h}_t = W\mathbf{h}_{t-1} + U\mathbf{x}_t has hidden state ht=Wth0+\mathbf{h}_t = W^t\mathbf{h}_0 + \cdots. The spectral radius ρ(W)=maxiλi(W)\rho(W) = \max_i|\lambda_i(W)| determines stability: if ρ(W)>1\rho(W) > 1, states explode (vanishing-gradient problem reversed); if ρ(W)<1\rho(W) < 1, states decay (vanishing gradients). ρ(W)=1\rho(W) = 1 is the "edge of chaos" that echo state networks target.

Code cell 43

# === 9.7 RNN Stability via Spectral Radius ===

def run_linear_rnn(W, T=100):
    """Run linear RNN h_t = W h_{t-1} from random init, return ||h_t|| over time."""
    h = np.random.randn(W.shape[0]); h /= la.norm(h)
    norms = []
    for _ in range(T):
        h = W @ h
        norms.append(la.norm(h))
    return np.array(norms)

np.random.seed(0)
n_hidden = 20

configs = []
for rho_target, label in [(1.2, 'rho=1.2 (explodes)'), (1.0, 'rho=1.0 (edge of chaos)'), (0.8, 'rho=0.8 (stable)')]:
    W_rand = np.random.randn(n_hidden, n_hidden)
    rho_current = np.max(np.abs(la.eigvals(W_rand)))
    W_scaled = W_rand * (rho_target / rho_current)  # rescale to target spectral radius
    rho_actual = np.max(np.abs(la.eigvals(W_scaled)))
    norms = run_linear_rnn(W_scaled, T=60)
    configs.append((label, rho_actual, norms))
    print(f'{label}: rho={rho_actual:.4f}, ||h_60||={norms[-1]:.4e}')

# Gradient backprop through time: gradient norm ~ rho^T
T = 60
print(f'\nBackprop gradient norms at T={T}:')
for rho_target in [1.2, 1.0, 0.8]:
    print(f'  rho={rho_target}: ||gradient|| ~ {rho_target**T:.2e}')

if HAS_MPL:
    plt.figure(figsize=(10, 4))
    for label, rho, norms in configs:
        plt.semilogy(norms, label=f'{label}  rho={rho:.3f}')
    plt.xlabel('Time step'); plt.ylabel('||h_t|| (log scale)')
    plt.title('Linear RNN Hidden State Norm vs Spectral Radius')
    plt.legend(); plt.tight_layout(); plt.show()
    print('Plot: RNN stability depending on spectral radius')

9.8 Diffusion Models and Score Functions

Diffusion models add Gaussian noise xt=αˉtx0+1αˉtϵ\mathbf{x}_t = \sqrt{\bar{\alpha}_t}\mathbf{x}_0 + \sqrt{1-\bar{\alpha}_t}\boldsymbol{\epsilon}. The score function xtlogp(xt)\nabla_{\mathbf{x}_t}\log p(\mathbf{x}_t) for a Gaussian prior with covariance Σ\Sigma has a direct eigenvalue interpretation: the score function is Σ1(xtμ)-\Sigma^{-1}(\mathbf{x}_t - \boldsymbol{\mu}). Large eigenvalues of Σ\Sigma = high-variance principal directions = easy to score; small eigenvalues = constrained directions = hard-to-learn score components.

Code cell 45

# === 9.8 Diffusion Score Function and Eigenvalues ===

# Gaussian data model: p(x0) = N(0, Sigma)
np.random.seed(42)
d = 4
# Known covariance with disparate eigenvalues
eigs_data = np.array([16., 4., 1., 0.25])
Q_data, _ = la.qr(np.random.randn(d, d))
Sigma = Q_data @ np.diag(eigs_data) @ Q_data.T

# Diffusion forward process: x_t = sqrt(alpha_t) * x_0 + sqrt(1-alpha_t) * eps
def diffuse(x0, alpha_t):
    eps = np.random.randn(*x0.shape)
    return np.sqrt(alpha_t) * x0 + np.sqrt(1 - alpha_t) * eps

# True score for q(x_t | x_0) ~ N(sqrt(alpha_t)*x0, (1-alpha_t)*I)
# Marginal q(x_t) ~ N(0, alpha_t * Sigma + (1-alpha_t)*I)
def marginal_score(xt, alpha_t, Sigma):
    Sigma_t = alpha_t * Sigma + (1 - alpha_t) * np.eye(d)
    return -la.solve(Sigma_t, xt)

# Eigenvalues of marginal covariance at different noise levels
print('Marginal covariance eigenvalues at different noise levels:')
print(f'{"alpha_t":>8} | {"Data eigs (scaled)":>30} | {"Noise I":>12} | {"Marginal eigs":>20}')
print('-' * 80)
for alpha_t in [1.0, 0.9, 0.5, 0.1, 0.01]:
    Sigma_t = alpha_t * Sigma + (1 - alpha_t) * np.eye(d)
    marg_eigs = np.sort(la.eigh(Sigma_t)[0])[::-1]
    print(f'{alpha_t:>8.2f} | data_scale={alpha_t:4.2f}*{eigs_data} | noise={(1-alpha_t):.2f} | {marg_eigs.round(3)}')

print('\nAt alpha_t=1: marginal = data distribution (full signal)')
print('At alpha_t=0: marginal = N(0, I) (pure noise, trivial score = -x_t)')
print('Large data eigenvalues persist longer before being overwhelmed by noise')

10. Generalised Eigenproblems

The generalised eigenproblem Av=λBvA\mathbf{v} = \lambda B\mathbf{v} arises when the standard eigenproblem is embedded in a different inner product space defined by BB. Key applications: linear discriminant analysis (LDA), canonical correlation analysis (CCA), Riemannian optimisation on natural gradient methods, and Fisher information.

Code cell 47

# === 10. Generalised Eigenproblem: Av = lambda Bv ===

# Build a generalised eigenproblem Av = lambda Bv
# Example: LDA between-class vs within-class scatter
np.random.seed(5)

# Simulate 3-class data in 4D
d, n_per_class = 4, 50
means = np.array([[2., 0., 0., 0.],
                   [0., 2., 0., 0.],
                   [0., 0., 2., 0.]])
Sw = np.eye(d) * 0.5  # within-class scatter (simplified)

# Between-class scatter Sb
grand_mean = means.mean(axis=0)
Sb = sum(n_per_class * np.outer(m - grand_mean, m - grand_mean) for m in means)

print('LDA Generalised Eigenproblem: Sb v = lambda Sw v')
print(f'Between-class scatter Sb shape: {Sb.shape}')
print(f'Within-class scatter Sw shape:  {Sw.shape}')

# Solve via scipy's eigh for generalised problem
evals_gen, evecs_gen = sla.eigh(Sb, Sw)
# Sort descending
idx = np.argsort(evals_gen)[::-1]
evals_gen = evals_gen[idx]
evecs_gen = evecs_gen[:, idx]

print(f'\nGeneralised eigenvalues (Fisher discriminant ratios): {evals_gen.round(4)}')
print(f'Number of discriminant directions: {np.sum(evals_gen > 1e-8)} (= min(n_classes-1, d) = {min(2,d)})')

# Verify Av = lambda Bv
for i in range(2):
    v = evecs_gen[:, i]
    lhs = Sb @ v
    rhs = evals_gen[i] * (Sw @ v)
    print(f'PASS - generalised Av=lambda Bv for eigenvalue {evals_gen[i]:.4f}' if np.allclose(lhs, rhs) else f'FAIL at i={i}')

# Generalised B-orthogonality: v_i^T Sw v_j = delta_ij
G = evecs_gen[:, :2].T @ Sw @ evecs_gen[:, :2]
print(f'\nB-orthogonality matrix (should be identity):')
print(G.round(6))

11. Eigenvalues and Stability Analysis

Linear dynamical systems xt+1=Axt\mathbf{x}_{t+1} = A\mathbf{x}_t (discrete) or x˙=Ax\dot{\mathbf{x}} = A\mathbf{x} (continuous) are stable iff the eigenvalues of AA lie inside the unit disk (discrete) or the left half-plane (continuous). We analyse fixed points, stability, and bifurcations.

Code cell 49

# === 11. Stability Analysis of Linear Dynamical Systems ===

def stability_status(A, system='discrete'):
    evals = la.eigvals(A)
    if system == 'discrete':
        stable = np.all(np.abs(evals) < 1)
        marginal = np.any(np.isclose(np.abs(evals), 1.0))
        rho = np.max(np.abs(evals))
        return evals, stable, rho
    else:  # continuous
        stable = np.all(evals.real < 0)
        rho = np.max(evals.real)
        return evals, stable, rho

# --- Discrete system examples ---
print('DISCRETE SYSTEMS (x_{t+1} = Ax_t): stable iff all |lambda_i| < 1')
print('='*65)

cases_discrete = [
    ('Stable decay',     np.array([[0.5, 0.1], [-0.1, 0.3]])),
    ('Marginally stable', np.array([[0.8, 0.6], [-0.6, 0.8]])),  # rotation
    ('Unstable growth',  np.array([[1.2, 0.1], [0.0, 0.9]])),
]
for name, A in cases_discrete:
    evals, stable, rho = stability_status(A, 'discrete')
    status = 'STABLE' if stable else 'UNSTABLE'
    print(f'{name}: rho(A)={rho:.4f}, eigenvalues={np.round(np.abs(evals), 4)}, {status}')

# --- Continuous system examples ---
print('\nCONTINUOUS SYSTEMS (dx/dt = Ax): stable iff all Re(lambda_i) < 0')
print('='*65)

cases_cont = [
    ('Stable spiral',    np.array([[-1., 2.], [-2., -1.]])),
    ('Centre (marginal)', np.array([[0., 1.], [-1., 0.]])),
    ('Unstable saddle',  np.array([[1., 0.], [0., -2.]])),
]
for name, A in cases_cont:
    evals, stable, rho = stability_status(A, 'continuous')
    status = 'STABLE' if stable else 'UNSTABLE'
    print(f'{name}: max Re(lambda)={rho:.4f}, eigenvalues={np.round(evals, 3)}, {status}')

# --- Visualise trajectories for 2D systems ---
if HAS_MPL:
    fig, axes = plt.subplots(1, 3, figsize=(15, 4))
    T = 80
    for ax, (name, A) in zip(axes, cases_discrete):
        for _ in range(8):
            x = np.random.randn(2) * 2
            traj = [x.copy()]
            for _ in range(T):
                x = A @ x
                traj.append(x.copy())
                if la.norm(x) > 50: break
            traj = np.array(traj)
            ax.plot(traj[:,0], traj[:,1], alpha=0.5, linewidth=0.8)
            ax.plot(traj[0,0], traj[0,1], 'go', markersize=4)
        evals, _, rho = stability_status(A)
        ax.set_title(f'{name}\nrho={rho:.3f}')
        ax.set_xlim(-6, 6); ax.set_ylim(-6, 6)
        ax.set_aspect('equal')
    plt.suptitle('Discrete Linear System Trajectories', fontsize=13)
    plt.tight_layout(); plt.show()
    print('Plot: phase portraits for stable/marginal/unstable discrete systems')

12. Common Mistakes

We demonstrate the most frequent eigenvalue errors with code that catches them.

Code cell 51

# === 12. Common Mistakes — Diagnostics ===

print('COMMON EIGENVALUE MISTAKES AND DIAGNOSTICS')
print('='*60)

# Mistake 1: Assuming eigenvectors are always real
print('\n[1] Non-real eigenvectors for real rotation matrices')
theta = np.pi / 4  # 45-degree rotation
R = np.array([[np.cos(theta), -np.sin(theta)],
               [np.sin(theta),  np.cos(theta)]])
evals_R = la.eigvals(R)
print(f'Rotation matrix eigenvalues: {np.round(evals_R, 4)}')
print(f'Real? {np.all(np.isreal(evals_R))} — they are complex (rotation has no real fixed directions)')

# Mistake 2: Confusing AM and GM
print('\n[2] Algebraic multiplicity != geometric multiplicity (defective matrices)')
A_defect = np.array([[2., 1.], [0., 2.]])  # Jordan block
evals_d = la.eigvals(A_defect)
evecs_d = la.eig(A_defect)[1]
rank_shift = la.matrix_rank(A_defect - 2*np.eye(2))
gm = 2 - rank_shift
print(f'Eigenvalues: {evals_d} (AM=2 for lambda=2)')
print(f'Geometric multiplicity = null(A-2I) dim = {gm}  (AM > GM => defective!)')

# Mistake 3: Using eig instead of eigh for symmetric matrices
print('\n[3] Using eig vs eigh for symmetric matrices')
S = np.array([[4., 2., 1.], [2., 3., 0.5], [1., 0.5, 2.]])
evals_eig  = np.sort(la.eigvals(S).real)[::-1]
evals_eigh = np.sort(la.eigh(S)[0])[::-1]
print(f'np.eig  eigenvalues: {evals_eig.round(8)}')
print(f'np.eigh eigenvalues: {evals_eigh.round(8)}')
print(f'Difference:          {np.abs(evals_eig - evals_eigh).max():.2e}')
print('Use eigh for symmetric/Hermitian — it is faster AND guarantees real results')

# Mistake 4: Not checking that eigenvectors actually satisfy Av = lambda v
print('\n[4] Always verify eigenpairs Av = lambda v')
A_check = np.random.randn(4, 4)
evals_c, evecs_c = la.eig(A_check)
residuals = [la.norm(A_check @ evecs_c[:,i] - evals_c[i]*evecs_c[:,i]) for i in range(4)]
print(f'Residuals ||Av - lambda v||: {[f"{r:.2e}" for r in residuals]}')
print(f'All residuals < 1e-12: {all(r < 1e-12 for r in residuals)}')

# Mistake 5: Forgetting that diagonalisation fails for defective matrices
print('\n[5] Diagonalisation fails for defective matrices')
try:
    evals5, P5 = la.eig(A_defect)
    D5 = np.diag(evals5)
    reconstructed = P5 @ D5 @ la.inv(P5)
    err = la.norm(A_defect - reconstructed)
    print(f'Reconstruction error ||A - P D P^-1||: {err:.6f}')
    print('High error => diagonalisation fails for defective matrices, need Jordan form')
except la.LinAlgError:
    print('LinAlgError: P is singular (expected for defective matrix)')

print('\nDone. Remember: always use eigh for symmetric, verify eigenpairs, check AM=GM for diagonalisability.')

13. Summary and Bridge to Exercises

This theory notebook has demonstrated:

SectionKey Takeaway
1. IntuitionEigenvectors = preserved directions; eigenvalues = scaling factors
2. Formal Definitionsdet(AλI)=0\det(A-\lambda I)=0; AM \geq GM; eigenspace = null(AλI)(A-\lambda I)
3. ComputingCharacteristic polynomial; power iteration; QR algorithm
4. Propertiestr=(λi)(\sum\lambda_i); det=λi\prod\lambda_i; Rayleigh quotient; Gershgorin
5. DiagonalisationA=VΛV1A=V\Lambda V^{-1}; requires nn linearly independent eigenvectors
6. Spectral TheoremSymmetric AA: A=QΛQA=Q\Lambda Q^\top, orthogonal QQ, real Λ\Lambda
7. Jordan FormDefective matrices: A=PJP1A=PJP^{-1} with Jordan blocks
8. SVDA=UΣVA=U\Sigma V^\top; singular values =eigenvalues of AA=\sqrt{\text{eigenvalues of }A^\top A}
9. AI ApplicationsPCA, GD convergence, Hessian, attention, GNNs, spectral norm, RNNs
10. GeneralisedAv=λBvAv=\lambda Bv; LDA, CCA, natural gradient
11. Stabilityρ(A)<1\rho(A)<1 (discrete) / Re(λ)<0(\lambda)<0 (continuous) \Rightarrow stable
12. MistakesUse eigh; verify Av=λv; check AM=GM; Jordan vs diagonal

Next steps: Open exercises.ipynb for 8 graded problems covering all these concepts.


14. Why This Matters for AI (2026 Perspective)

A concise demonstration of the most AI-relevant eigenvalue fact not covered above: the Neural Tangent Kernel (NTK) eigenspectrum determines the convergence of infinitely-wide neural networks during gradient descent.

Code cell 54

# === 14. NTK-style Eigenvalue Analysis ===

# The Neural Tangent Kernel K = J J^T (J = Jacobian of network outputs w.r.t. params)
# For a linear network trained with GD, the loss in each eigendirection decays as
# L_i(t) = L_i(0) * exp(-2 * lambda_i(K) * t)
# where lambda_i(K) are NTK eigenvalues.

np.random.seed(99)

# Simulate NTK eigenvalues for a 2-layer network on MNIST-like data (empirical)
# In practice NTK eigenvalues ~ polynomial decay lambda_k ~ k^{-alpha}
n_data = 50
alpha_ntk = 1.5  # power-law decay exponent
ntk_eigs = np.array([k**(-alpha_ntk) for k in range(1, n_data+1)])
ntk_eigs = ntk_eigs / ntk_eigs[0]  # normalise

print('Simulated NTK eigenvalue spectrum (power-law decay lambda_k ~ k^{-1.5})')
print(f'  lambda_1 = {ntk_eigs[0]:.4f} (fastest mode)')
print(f'  lambda_10 = {ntk_eigs[9]:.4f}')
print(f'  lambda_50 = {ntk_eigs[-1]:.6f} (slowest mode)')

# Loss decay in each eigendirection
eta = 0.5  # learning rate
T = 500
t = np.arange(T)

if HAS_MPL:
    fig, axes = plt.subplots(1, 2, figsize=(14, 4))
    # NTK eigenspectrum
    axes[0].loglog(range(1, n_data+1), ntk_eigs, 'o-', markersize=4)
    axes[0].set_xlabel('Eigenvalue index k'); axes[0].set_ylabel('lambda_k (log scale)')
    axes[0].set_title(f'NTK Eigenspectrum (power law alpha={alpha_ntk})')
    # Loss decay for different modes
    for k_idx, color in [(0,'red'), (4,'orange'), (9,'green'), (49,'blue')]:
        lam = ntk_eigs[k_idx]
        loss_decay = np.exp(-2 * lam * eta * t)
        axes[1].semilogy(t, loss_decay, color=color, label=f'mode k={k_idx+1}, lambda={lam:.4f}')
    axes[1].set_xlabel('Training step'); axes[1].set_ylabel('Loss in eigendirection (log)')
    axes[1].set_title('NTK: Loss Decay per Eigenmode')
    axes[1].legend(fontsize=9); plt.tight_layout(); plt.show()
    print('Plot: NTK spectrum and per-mode loss decay (spectral bias / frequency principle)')

print('\nKEY INSIGHT: Low-frequency (large eigenvalue) components are learned first.')
print('High-frequency (small eigenvalue) components take exponentially longer.')
print('This is the spectral bias / frequency principle of neural networks.')

15. Conceptual Bridge

Where we came from: Section 02-06 (Vector Spaces and Subspaces) gave us the language of null spaces, column spaces, and bases. Eigenvalues are the next level: instead of asking "what subspaces does AA preserve?" we ask "what directions does AA act most simply on?" Eigenspaces are invariant subspaces with a scalar action.

Where we are going: Section 03-02 (Singular Value Decomposition) generalises the eigendecomposition to rectangular matrices and non-symmetric square matrices. Every matrix has an SVD; only square matrices with nn independent eigenvectors have an eigendecomposition. The SVD A=UΣVA = U\Sigma V^\top is the most important matrix factorisation in computational mathematics — it connects to eigenvalues via σi=λi(AA)\sigma_i = \sqrt{\lambda_i(A^\top A)}.

CONCEPTUAL MAP
════════════════════════════════════════════════════════════════

  Vector Spaces     Eigenvalues &      Singular Value
  & Subspaces   ──► Eigenvectors  ──► Decomposition
  (02-06)           (03-01) ◄HERE      (03-02)
                        │
                   ┌────┴──────────────────────────────┐
                   │  PCA · GD convergence · Hessian   │
                   │  Attention · GNNs · RNN stability │
                   │  Spectral norm · NTK · LoRA       │
                   └────────────────────────────────────┘
════════════════════════════════════════════════════════════════

The single most important thing to remember: The eigenvalue spectrum is the complete description of how a linear map acts over time. Everything about long-run behaviour, stability, convergence rate, and compression capacity is encoded in the eigenvalues. Internalise this and linear algebra becomes a lens for understanding all of deep learning.