Private notes
0/8000

Notes stay private to your browser until account sync is configured.

Part 3
11 min read18 headingsSplit lesson page

Lesson overview | Previous part | Lesson overview

Random Graphs: Appendix L: Worked Examples to Appendix Q: Quick Reference Card

Appendix L: Worked Examples

L.1 Computing the Giant Component Fraction

Problem: For G(n,p)G(n,p) with p=2.5/np = 2.5/n, find the theoretical fraction of vertices in the giant component.

Solution: We need β\beta satisfying β=1e2.5β\beta = 1 - e^{-2.5\beta}.

Define f(β)=1e2.5ββf(\beta) = 1 - e^{-2.5\beta} - \beta. We need f(β)=0f(\beta) = 0.

  • f(0)=0f(0) = 0 (trivial solution - subcritical regime would have this as the only root)
  • f(β)=2.5e2.5β1f'(\beta) = 2.5 e^{-2.5\beta} - 1; f(0)=1.5>0f'(0) = 1.5 > 0 (slope at 0 is positive)
  • ff is concave for β>0\beta > 0, so there is one non-trivial root

Newton's method: Starting from β0=0.7\beta_0 = 0.7:

  • f(0.7)=1e1.750.7=10.17380.7=0.1262f(0.7) = 1 - e^{-1.75} - 0.7 = 1 - 0.1738 - 0.7 = 0.1262
  • f(0.7)=2.50.17381=0.5655f'(0.7) = 2.5 \cdot 0.1738 - 1 = -0.5655
  • β1=0.70.1262/(0.5655)=0.7+0.2232=0.923\beta_1 = 0.7 - 0.1262/(-0.5655) = 0.7 + 0.2232 = 0.923 (overshoot, try smaller)

Using bisection between 0.7 and 0.95:

  • f(0.8)=1e20.8=10.13530.8=0.0647>0f(0.8) = 1 - e^{-2} - 0.8 = 1 - 0.1353 - 0.8 = 0.0647 > 0
  • f(0.85)=1e2.1250.85=10.11940.85=0.0306>0f(0.85) = 1 - e^{-2.125} - 0.85 = 1 - 0.1194 - 0.85 = 0.0306 > 0
  • f(0.88)10.11080.88=0.0092>0f(0.88) \approx 1 - 0.1108 - 0.88 = 0.0092 > 0
  • f(0.89)10.10840.89=0.0016>0f(0.89) \approx 1 - 0.1084 - 0.89 = 0.0016 > 0
  • f(0.895)10.10720.895=0.0022<0f(0.895) \approx 1 - 0.1072 - 0.895 = -0.0022 < 0

So β0.892\beta \approx 0.892 - about 89.2% of vertices are in the giant component.

Interpretation: At average degree 2.5, the network is well into the supercritical regime. The vast majority of vertices are connected in one giant component, leaving only about 10.8% in small satellite components.

L.2 Kesten-Stigum Threshold Calculation

Problem: For the 2-block SBM with a=15a = 15 and b=4b = 4, determine: (a) Is community detection possible? (b) Can we achieve exact recovery?

Solution:

(a) Detection (weak recovery): Kesten-Stigum threshold: (ab)2>2(a+b)(a-b)^2 > 2(a+b)

(154)2=112=121(15-4)^2 = 11^2 = 121

2(15+4)=219=382(15+4) = 2 \cdot 19 = 38

121>38121 > 38 OK - Detection is possible.

SNR =(ab)2/(2(a+b))=121/383.181= (a-b)^2 / (2(a+b)) = 121/38 \approx 3.18 \gg 1.

(b) Exact recovery: Need (ab)2>2(\sqrt{a} - \sqrt{b})^2 > 2 (scaled threshold for logarithmic degree regime).

153.873\sqrt{15} \approx 3.873, 4=2\sqrt{4} = 2.

(154)2=(3.8732)2=(1.873)2=3.51>2(\sqrt{15} - \sqrt{4})^2 = (3.873 - 2)^2 = (1.873)^2 = 3.51 > 2 OK - Exact recovery is achievable.

Interpretation: With a=15a = 15 and b=4b = 4, we have a relatively easy community detection problem - both detection and exact recovery are achievable. Spectral clustering should work well here.

L.3 Small-World Parameter Calculation

Problem: Design a WS graph on n=1000n = 1000 nodes that approximates the C. elegans neural connectome: C0.28C \approx 0.28, L2.65L \approx 2.65, k14k \approx 14.

Solution:

Step 1: Initial ring lattice has C(0)=3(k2)/(4(k1))=312/(413)=36/520.692C(0) = 3(k-2)/(4(k-1)) = 3 \cdot 12/(4 \cdot 13) = 36/52 \approx 0.692.

Step 2: Target clustering C(β)0.28C(\beta) \approx 0.28. Using C(β)C(0)(1β)3C(\beta) \approx C(0)(1-\beta)^3:

0.280.692(1β)30.28 \approx 0.692 (1-\beta)^3 (1β)30.404(1-\beta)^3 \approx 0.404 1β0.739,β0.2611-\beta \approx 0.739, \quad \beta \approx 0.261

Step 3: Verify path length at β=0.261\beta = 0.261. Using the WS approximation L(β)nkf(nkβ/2)L(\beta) \approx \frac{n}{k} \cdot f(nk\beta/2) where f(u)loguuf(u) \approx \frac{\log u}{u} for large uu:

nkβ/2=1000140.261/2=1827nk\beta/2 = 1000 \cdot 14 \cdot 0.261 / 2 = 1827

L(1000/14)log(1827)/182771.47.51/18270.29L \approx (1000/14) \cdot \log(1827)/1827 \approx 71.4 \cdot 7.51/1827 \approx 0.29

This is too small - the approximation breaks down at large β\beta. Numerically, L(β=0.26)3.5L(\beta = 0.26) \approx 3.5 for n=1000n=1000, k=14k=14, which is reasonably close to the target L=2.65L = 2.65.

Conclusion: WS parameters (n=1000,k=14,β0.26)(n=1000, k=14, \beta \approx 0.26) produce a graph close to C. elegans in both clustering and path length, validating the small-world model.

L.4 Graphon Estimation from Data

Problem: Given a graph GG with 100 nodes known to be sampled from a 3-block SBM, estimate the graphon.

Procedure:

  1. Run spectral clustering with k=3k=3 to get estimated community labels σ^\hat{\sigma}.

  2. Sort vertices by σ^\hat{\sigma}: reorder adjacency matrix so community 1 nodes come first, then community 2, then community 3.

  3. Estimate block probabilities: For communities r,sr, s:

B^rs={(u,v)E:σ^(u)=r,σ^(v)=s}{(u,v):σ^(u)=r,σ^(v)=s}\hat{B}_{rs} = \frac{|\{(u,v) \in E : \hat{\sigma}(u) = r, \hat{\sigma}(v) = s\}|}{|\{(u,v) : \hat{\sigma}(u) = r, \hat{\sigma}(v) = s\}|}
  1. Construct step-function graphon:
W^(x,y)=B^3x,3y\hat{W}(x, y) = \hat{B}_{\lceil 3x \rceil, \lceil 3y \rceil}

(piecewise constant on [0,1]2[0,1]^2 divided into 3×33 \times 3 blocks)

  1. Evaluate quality: Compute cut distance d(W^,W)d_\square(\hat{W}, W^*) where WW^* is the true graphon. The cut distance converges to 0 as nn \to \infty.

Error bound: For a kk-block SBM, the best graphon estimator achieves error O(k/n)O(k/\sqrt{n}) in cut distance (minimax optimal). With n=100n = 100 and k=3k = 3, expected cut distance 3/10=0.3\approx 3/10 = 0.3 - substantial estimation uncertainty.


Appendix M: Connections to Statistical Physics

M.1 Random Graphs and Spin Glasses

The Stochastic Block Model community detection problem is isomorphic to the ferromagnetic Ising model on the graph:

  • Community labels σv{+1,1}\sigma_v \in \{+1, -1\} correspond to spin states
  • Edges within communities are "ferromagnetic" (prefer aligned spins)
  • Edges between communities are "antiferromagnetic" (prefer anti-aligned)

The Kesten-Stigum threshold corresponds to the Nishimori temperature in the disordered Ising model - the temperature at which the magnetization (overlap with ground truth) first becomes nonzero.

Belief propagation = Cavity method: The belief propagation algorithm for community detection in SBM is the Bethe approximation applied to the Ising model. Near the KS threshold, BP is asymptotically optimal - no polynomial-time algorithm can do better.

M.2 Percolation and Statistical Mechanics

Bond percolation on Z2\mathbb{Z}^2 is a model in statistical mechanics. Its critical phenomena:

  • Order parameter: P(ρ)=P[vertex is in infinite component]P_\infty(\rho) = \mathbb{P}[\text{vertex is in infinite component}]
  • Critical exponent: P(ρ)(ρρc)βP_\infty(\rho) \sim (\rho - \rho_c)^\beta with β=5/36\beta = 5/36 (exact, 2D)
  • Correlation length: ξ(ρ)ρρcν\xi(\rho) \sim |\rho - \rho_c|^{-\nu} with ν=4/3\nu = 4/3 (exact, 2D)

For ER graphs: P(c)=β(c)P_\infty(c) = \beta(c) with critical exponent 1 (mean-field universality class, since ER has "infinite dimension").

Conformal invariance (2D percolation): Near criticality, 2D percolation is conformally invariant - the scaling limit is described by SLE (Schramm-Loewner Evolution). This deep connection between combinatorics and complex analysis has no direct ML application yet, but illustrates the richness of the field.

M.3 Free Energy and the Replica Method

The replica method is a non-rigorous but powerful physics technique for computing expectations over random graphs.

For the SBM community detection problem, the free energy (log-partition function) per vertex is:

f=limn1nE[logZ(σ,G)]f = \lim_{n \to \infty} \frac{1}{n} \mathbb{E}[\log Z(\sigma, G)]

where Z=σeH(σ,G)Z = \sum_\sigma e^{H(\sigma, G)} and HH is the log-likelihood of the community assignment.

The replica calculation predicts the exact phase diagram of the SBM - both the KS threshold and the exact recovery threshold - and was later made rigorous using interpolation methods (Guerra, Talagrand).

ML connection: Free energy calculations in statistical physics are the rigorous foundation of variational inference in machine learning. The Bethe free energy (used in belief propagation) is the physics analog of the ELBO (evidence lower bound) in variational autoencoders.


End of 06-Random-Graphs notes.


Appendix N: Extended Exercise Solutions

N.1 Solution Notes for Exercise 1 (Phase Transition)

Key implementation details:

The fixed-point equation β=1ecβ\beta = 1 - e^{-c\beta} can be solved numerically via Newton's method:

βk+1=βkβk1+ecβk1cecβk\beta_{k+1} = \beta_k - \frac{\beta_k - 1 + e^{-c\beta_k}}{1 - ce^{-c\beta_k}}

Starting from β0=1ec\beta_0 = 1 - e^{-c} (one step of Picard iteration), Newton's method converges in 5-10 iterations for c[1,5]c \in [1, 5].

For c1c \le 1: the only fixed point is β=0\beta = 0 (return 0). For c>1c > 1: there are two fixed points (0 and the positive root); return the positive root.

Expected results table:

ccTheoretical β(c)\beta(c)Simulated L1/nL_1/n (variance)
0.50O(logn/n)0.004O(\log n / n) \approx 0.004
0.80O(logn/n)0.006O(\log n / n) \approx 0.006
1.00 (but n1/3n^{-1/3} scaling)0.15n1/3\approx 0.15 \cdot n^{-1/3}
1.50.5830.583±0.0150.583 \pm 0.015
2.00.7970.797±0.0080.797 \pm 0.008
2.50.8920.892±0.0050.892 \pm 0.005
3.00.9400.940±0.0030.940 \pm 0.003

Why L2L_2 peaks at criticality: The second-largest component size peaks at c=1c = 1 (size n2/3\sim n^{2/3}) because this is where the giant component is just forming - there are many large components competing. For c>1c > 1, the giant component absorbs everything, leaving only O(logn)O(\log n) satellites.

N.2 Solution Notes for Exercise 4 (Community Detection)

Spectral algorithm implementation:

def spectral_community_detection(A, k=2):
    """Spectral clustering for SBM community detection."""
    n = A.shape[0]
    # Compute degree matrix and normalized Laplacian
    d = A.sum(axis=1)
    D_inv_sqrt = np.diag(1.0 / np.sqrt(d + 1e-10))
    L_sym = np.eye(n) - D_inv_sqrt @ A @ D_inv_sqrt
    
    # Eigendecomposition (smallest eigenvalues = community structure)
    eigvals, eigvecs = np.linalg.eigh(L_sym)
    
    # Second eigenvector (first non-trivial)
    u2 = eigvecs[:, 1]
    
    # Threshold at 0
    labels = (u2 > 0).astype(int)
    return labels

def community_accuracy(labels_est, labels_true):
    """Fraction correct, up to global flip."""
    acc1 = np.mean(labels_est == labels_true)
    acc2 = np.mean(labels_est == (1 - labels_true))
    return max(acc1, acc2)

Expected accuracy table:

(a,b)(a, b)SNR = (ab)2/(2(a+b))(a-b)^2/(2(a+b))RegimeExpected accuracy
(20,5)(20, 5)225/50=4.5225/50 = 4.5Well above KS0.95-0.99
(10,5)(10, 5)25/30=0.8325/30 = 0.83Below KS0.5\approx 0.5 (random)
(6,4)(6, 4)4/20=0.24/20 = 0.2Below KS0.5\approx 0.5 (random)
(8,3)(8, 3)25/22=1.1425/22 = 1.14Just above KS0.55-0.65

N.3 Solution Notes for Exercise 8 (Preferential Attachment)

Efficient alias sampling for preferential attachment:

The naive approach (linear scan to sample proportional to degree) is O(n)O(n) per edge, giving O(n2)O(n^2) total. For large nn, use the alias method:

  1. Maintain a list of (dv,v)(d_v, v) pairs for all existing nodes
  2. Use the alias method to sample from this distribution in O(1)O(1) per sample
  3. After adding new edges, update the alias table in O(m)O(m) per step

Alternatively, use the Vose-Walker alias method which supports updates efficiently.

Simpler O(nlogn)O(n \log n) approach: Maintain a binary indexed tree (Fenwick tree) over cumulative degrees. Sampling is O(logn)O(\log n) per edge, updating is O(logn)O(\log n) per edge, giving O(nmlogn)O(nm \log n) total.

Power law fit: Maximum likelihood estimator for power-law exponent on data k1,,kmk_1, \ldots, k_m with kikmink_i \ge k_{\min}:

γ^=1+m[i=1mlnkikmin0.5]1\hat{\gamma} = 1 + m \left[\sum_{i=1}^m \ln\frac{k_i}{k_{\min} - 0.5}\right]^{-1}

For BA with m=2m = 2 and kmin=10k_{\min} = 10: expect γ^3.0±0.2\hat{\gamma} \approx 3.0 \pm 0.2 for n=10000n = 10000.


Appendix O: Further Reading

O.1 Textbooks

  1. Bollobas, B. (2001). Random Graphs (2nd ed.). Cambridge University Press.

    • The definitive mathematical treatment; covers ER theory exhaustively.
  2. Janson, S., Luczak, T., Rucinski, A. (2000). Random Graphs. Wiley.

    • More accessible than Bollobas; covers first and second moment methods thoroughly.
  3. Lovasz, L. (2012). Large Networks and Graph Limits. AMS.

    • The definitive treatment of graphon theory by its creator.
  4. Durrett, R. (2007). Random Graph Dynamics. Cambridge University Press.

    • Dynamical random graphs and their applications to epidemics and social networks.

O.2 Foundational Papers

  1. Erdos, P., Renyi, A. (1960). On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutato Int. Kozl., 5, 17-61.

  2. Watts, D.J., Strogatz, S.H. (1998). Collective dynamics of 'small-world' networks. Nature, 393, 440-442.

  3. Barabasi, A.-L., Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509-512.

  4. Lovasz, L., Szegedy, B. (2006). Limits of dense graph sequences. J. Combin. Theory Ser. B, 96, 933-957.

  5. Mossel, E., Neeman, J., Sly, A. (2015). Reconstruction and estimation in the planted partition model. Probab. Theory Related Fields, 162, 431-461.

  6. Abbe, E., Sandon, C. (2015). Community detection in the stochastic block models by spectral methods. arXiv:1503.00609.

O.3 ML-Specific Papers

  1. Palowitch, J. et al. (2022). GraphWorld: Fake Graphs Bring Real Insights for GNNs. KDD 2022.

  2. Keriven, N., Peyre, G. (2019). Universal Invariant and Equivariant Graph Neural Networks. NeurIPS 2019.

  3. Vignac, C. et al. (2022). DiGress: Discrete Denoising diffusion for graph generation. ICLR 2023.

  4. Rusch, T.K., Bronstein, M.M., Mishra, S. (2023). A survey on oversmoothing in graph neural networks. arXiv:2303.10993.

  5. Broido, A.D., Clauset, A. (2019). Scale-free networks are rare. Nature Communications, 10, 1017.


<- Back to Graph Theory | Next: Graph Algorithms ->


Appendix P: Summary of Key Inequalities

P.1 Concentration Inequalities for Random Graphs

Markov's inequality: For non-negative XX:

P[Xt]E[X]t\mathbb{P}[X \ge t] \le \frac{\mathbb{E}[X]}{t}

Chebyshev's inequality:

P[XE[X]t]Var(X)t2\mathbb{P}[|X - \mathbb{E}[X]| \ge t] \le \frac{\text{Var}(X)}{t^2}

Chernoff bound (Poisson random variables): For XPoisson(μ)X \sim \text{Poisson}(\mu):

P[X(1+δ)μ]eμδ2/(2+δ)\mathbb{P}[X \ge (1+\delta)\mu] \le e^{-\mu \delta^2 / (2+\delta)} P[X(1δ)μ]eμδ2/2\mathbb{P}[X \le (1-\delta)\mu] \le e^{-\mu \delta^2 / 2}

Paley-Zygmund inequality: For non-negative XX with finite second moment:

P[X>0](E[X])2E[X2]\mathbb{P}[X > 0] \ge \frac{(\mathbb{E}[X])^2}{\mathbb{E}[X^2]}

Azuma-Hoeffding (Martingale concentration): If (X0,X1,,Xm)(X_0, X_1, \ldots, X_m) is a martingale with XkXk1ck|X_k - X_{k-1}| \le c_k:

P[XmX0t]2exp(t22kck2)\mathbb{P}[|X_m - X_0| \ge t] \le 2\exp\left(-\frac{t^2}{2\sum_k c_k^2}\right)

For random graph properties: reveal edges one at a time. Each edge revelation changes the property value by at most ckc_k (Lipschitz constant of the property). Azuma-Hoeffding gives concentration of order m=O(n)\sqrt{m} = O(n) around the mean.

McDiarmid's inequality (Bounded differences): If f(G)f(G) changes by at most cc when one edge is added/removed:

P[f(G)E[f(G)]t]2exp(t22(n2)c2)\mathbb{P}[|f(G) - \mathbb{E}[f(G)]| \ge t] \le 2\exp\left(-\frac{t^2}{2\binom{n}{2}c^2}\right)

Applications: triangle count (c=O(n)c = O(n) per edge change), largest clique (c=1c = 1), chromatic number (c=1c = 1).

P.2 Spectral Inequalities

Perron-Frobenius: For non-negative irreducible AA, λ1(A)>0\lambda_1(A) > 0 and the leading eigenvector has all positive entries.

Courant-Fischer (Min-Max): The kk-th eigenvalue of symmetric AA satisfies:

λk(A)=mindimV=nk+1maxxV,x=1xAx\lambda_k(A) = \min_{\dim V = n-k+1} \max_{\mathbf{x} \in V, \|\mathbf{x}\|=1} \mathbf{x}^\top A \mathbf{x}

Weyl's theorem: For symmetric AA and EE:

λk(A+E)λk(A)Eopk|\lambda_k(A+E) - \lambda_k(A)| \le \|E\|_{op} \quad \forall k

Bauer-Fike: If AA is diagonalizable with condition number κ\kappa, eigenvalue perturbations are bounded by κE\kappa \|E\|. For symmetric AA: κ=1\kappa = 1 (always), recovering Weyl's theorem.

Cheeger inequality: For graph Laplacian LL, the Cheeger constant h(G)=minS:Sn/2E(S,Sˉ)/Sh(G) = \min_{S: |S| \le n/2} |E(S, \bar{S})| / |S| satisfies:

λ22h(G)2λ2\frac{\lambda_2}{2} \le h(G) \le \sqrt{2\lambda_2}

This is the fundamental connection between algebraic (spectral gap) and geometric (cut quality) properties of graphs.


<- Back to Graph Theory | Next: Graph Algorithms ->


Appendix Q: Quick Reference Card

RANDOM GRAPH QUICK REFERENCE
========================================================================

  ERDOS-RENYI G(n,p)
  ------------------
  - Average degree:   np
  - Degree dist:      Poisson(np) as n->\infty
  - Giant component:  exists iff p > 1/n  ->  fraction \beta(c) w/ c=np
  - Connectivity:     w.h.p. iff p \geq ln(n)/n
  - Diameter:         ~ log(n)/log(np) when connected
  - Clustering:       ~ p -> 0  (locally tree-like)

  WATTS-STROGATZ (n, k, \beta)
  -------------------------
  - Degree:            \approx k (Poisson-like spread from rewiring)
  - Clustering:        C(0)(1-\beta)^3  where C(0) \approx 3/4 for large k
  - Path length:       O(log n) for any \beta > 0
  - Small-world:       \beta \in [1/(nk), 0.3] gives C\ggC_ER and L\approxL_ER

  BARABASI-ALBERT (n, m)
  -----------------------
  - Degree dist:       P(k) ~ 2m^2/k^3  (power law \gamma=3)
  - Max degree:        ~ m\sqrtn  (oldest node)
  - Diameter:          ~ log(n)/log(log(n))
  - Clustering:        C ~ (log n)^2/n -> 0  (no clustering!)

  STOCHASTIC BLOCK MODEL SBM(n,k,\sigma,B)
  -------------------------------------
  - KS threshold:      (a-b)^2 = 2(a+b)  for 2-block, p=a/n, q=b/n
  - Exact recovery:    (\sqrta - \sqrtb)^2 = 2   for log-degree regime
  - Spectral alg:      works above KS threshold
  - GNN limit:         No GNN beats KS threshold on SBM data

  GRAPHONS W:[0,1]^2->[0,1]
  -------------------------
  - Dense graph limit: every dense G-sequence converges to a graphon
  - Sampling:          draw \xi^i~U[0,1]; edge (i,j) w.p. W(\xi^i,\xi_j)
  - ER graphon:        W(x,y) = p  (constant)
  - SBM graphon:       W(x,y) = B_{\lceilkx\rceil,\lceilky\rceil}  (step function)
  - Operator:          T_W h(x) = \int_0^1 W(x,y)h(y)dy  (Hilbert-Schmidt)

========================================================================

Skill Check

Test this lesson

Answer 4 quick questions to lock in the lesson and feed your adaptive practice queue.

--
Score
0/4
Answered
Not attempted
Status
1

Which module does this lesson belong to?

2

Which section is covered in this lesson content?

3

Which term is most central to this lesson?

4

What is the best way to use this lesson for real learning?

Your answers save locally first, then sync when account storage is available.
Practice queue