Part 3

11 min read18 headingsSplit lesson page

Lesson overview | Previous part | Lesson overview

Random Graphs: Appendix L: Worked Examples to Appendix Q: Quick Reference Card

Appendix L: Worked Examples

L.1 Computing the Giant Component Fraction

Problem: For $G(n,p)$ with $p = 2.5/n$ , find the theoretical fraction of vertices in the giant component.

Solution: We need $\beta$ satisfying $\beta = 1 - e^{-2.5\beta}$ .

Define $f(\beta) = 1 - e^{-2.5\beta} - \beta$ . We need $f(\beta) = 0$ .

$f(0) = 0$ (trivial solution - subcritical regime would have this as the only root)
$f'(\beta) = 2.5 e^{-2.5\beta} - 1$ ; $f'(0) = 1.5 > 0$ (slope at 0 is positive)
$f$ is concave for $\beta > 0$ , so there is one non-trivial root

Newton's method: Starting from $\beta_0 = 0.7$ :

$f(0.7) = 1 - e^{-1.75} - 0.7 = 1 - 0.1738 - 0.7 = 0.1262$
$f'(0.7) = 2.5 \cdot 0.1738 - 1 = -0.5655$
$\beta_1 = 0.7 - 0.1262/(-0.5655) = 0.7 + 0.2232 = 0.923$ (overshoot, try smaller)

Using bisection between 0.7 and 0.95:

$f(0.8) = 1 - e^{-2} - 0.8 = 1 - 0.1353 - 0.8 = 0.0647 > 0$
$f(0.85) = 1 - e^{-2.125} - 0.85 = 1 - 0.1194 - 0.85 = 0.0306 > 0$
$f(0.88) \approx 1 - 0.1108 - 0.88 = 0.0092 > 0$
$f(0.89) \approx 1 - 0.1084 - 0.89 = 0.0016 > 0$
$f(0.895) \approx 1 - 0.1072 - 0.895 = -0.0022 < 0$

So $\beta \approx 0.892$ - about 89.2% of vertices are in the giant component.

Interpretation: At average degree 2.5, the network is well into the supercritical regime. The vast majority of vertices are connected in one giant component, leaving only about 10.8% in small satellite components.

L.2 Kesten-Stigum Threshold Calculation

Problem: For the 2-block SBM with $a = 15$ and $b = 4$ , determine: (a) Is community detection possible? (b) Can we achieve exact recovery?

Solution:

(a) Detection (weak recovery): Kesten-Stigum threshold: $(a-b)^2 > 2(a+b)$

$(15-4)^2 = 11^2 = 121$

$2(15+4) = 2 \cdot 19 = 38$

$121 > 38$ OK - Detection is possible.

SNR $= (a-b)^2 / (2(a+b)) = 121/38 \approx 3.18 \gg 1$ .

(b) Exact recovery: Need $(\sqrt{a} - \sqrt{b})^2 > 2$ (scaled threshold for logarithmic degree regime).

$\sqrt{15} \approx 3.873$ , $\sqrt{4} = 2$ .

$(\sqrt{15} - \sqrt{4})^2 = (3.873 - 2)^2 = (1.873)^2 = 3.51 > 2$ OK - Exact recovery is achievable.

Interpretation: With $a = 15$ and $b = 4$ , we have a relatively easy community detection problem - both detection and exact recovery are achievable. Spectral clustering should work well here.

L.3 Small-World Parameter Calculation

Problem: Design a WS graph on $n = 1000$ nodes that approximates the C. elegans neural connectome: $C \approx 0.28$ , $L \approx 2.65$ , $k \approx 14$ .

Solution:

Step 1: Initial ring lattice has $C(0) = 3(k-2)/(4(k-1)) = 3 \cdot 12/(4 \cdot 13) = 36/52 \approx 0.692$ .

Step 2: Target clustering $C(\beta) \approx 0.28$ . Using $C(\beta) \approx C(0)(1-\beta)^3$ :

0.28 \approx 0.692 (1-\beta)^3

(1-\beta)^3 \approx 0.404

1-\beta \approx 0.739, \quad \beta \approx 0.261

Step 3: Verify path length at $\beta = 0.261$ . Using the WS approximation $L(\beta) \approx \frac{n}{k} \cdot f(nk\beta/2)$ where $f(u) \approx \frac{\log u}{u}$ for large $u$ :

$nk\beta/2 = 1000 \cdot 14 \cdot 0.261 / 2 = 1827$

$L \approx (1000/14) \cdot \log(1827)/1827 \approx 71.4 \cdot 7.51/1827 \approx 0.29$

This is too small - the approximation breaks down at large $\beta$ . Numerically, $L(\beta = 0.26) \approx 3.5$ for $n=1000$ , $k=14$ , which is reasonably close to the target $L = 2.65$ .

Conclusion: WS parameters $(n=1000, k=14, \beta \approx 0.26)$ produce a graph close to C. elegans in both clustering and path length, validating the small-world model.

L.4 Graphon Estimation from Data

Problem: Given a graph $G$ with 100 nodes known to be sampled from a 3-block SBM, estimate the graphon.

Procedure:

Run spectral clustering with $k=3$ to get estimated community labels $\hat{\sigma}$ .
Sort vertices by $\hat{\sigma}$ : reorder adjacency matrix so community 1 nodes come first, then community 2, then community 3.
Estimate block probabilities: For communities $r, s$ :

\hat{B}_{rs} = \frac{|\{(u,v) \in E : \hat{\sigma}(u) = r, \hat{\sigma}(v) = s\}|}{|\{(u,v) : \hat{\sigma}(u) = r, \hat{\sigma}(v) = s\}|}

Construct step-function graphon:

\hat{W}(x, y) = \hat{B}_{\lceil 3x \rceil, \lceil 3y \rceil}

(piecewise constant on $[0,1]^2$ divided into $3 \times 3$ blocks)

Evaluate quality: Compute cut distance $d_\square(\hat{W}, W^*)$ where $W^*$ is the true graphon. The cut distance converges to 0 as $n \to \infty$ .

Error bound: For a $k$ -block SBM, the best graphon estimator achieves error $O(k/\sqrt{n})$ in cut distance (minimax optimal). With $n = 100$ and $k = 3$ , expected cut distance $\approx 3/10 = 0.3$ - substantial estimation uncertainty.

Appendix M: Connections to Statistical Physics

M.1 Random Graphs and Spin Glasses

The Stochastic Block Model community detection problem is isomorphic to the ferromagnetic Ising model on the graph:

Community labels $\sigma_v \in \{+1, -1\}$ correspond to spin states
Edges within communities are "ferromagnetic" (prefer aligned spins)
Edges between communities are "antiferromagnetic" (prefer anti-aligned)

The Kesten-Stigum threshold corresponds to the Nishimori temperature in the disordered Ising model - the temperature at which the magnetization (overlap with ground truth) first becomes nonzero.

Belief propagation = Cavity method: The belief propagation algorithm for community detection in SBM is the Bethe approximation applied to the Ising model. Near the KS threshold, BP is asymptotically optimal - no polynomial-time algorithm can do better.

M.2 Percolation and Statistical Mechanics

Bond percolation on $\mathbb{Z}^2$ is a model in statistical mechanics. Its critical phenomena:

Order parameter: $P_\infty(\rho) = \mathbb{P}[\text{vertex is in infinite component}]$
Critical exponent: $P_\infty(\rho) \sim (\rho - \rho_c)^\beta$ with $\beta = 5/36$ (exact, 2D)
Correlation length: $\xi(\rho) \sim |\rho - \rho_c|^{-\nu}$ with $\nu = 4/3$ (exact, 2D)

For ER graphs: $P_\infty(c) = \beta(c)$ with critical exponent 1 (mean-field universality class, since ER has "infinite dimension").

Conformal invariance (2D percolation): Near criticality, 2D percolation is conformally invariant - the scaling limit is described by SLE (Schramm-Loewner Evolution). This deep connection between combinatorics and complex analysis has no direct ML application yet, but illustrates the richness of the field.

M.3 Free Energy and the Replica Method

The replica method is a non-rigorous but powerful physics technique for computing expectations over random graphs.

For the SBM community detection problem, the free energy (log-partition function) per vertex is:

f = \lim_{n \to \infty} \frac{1}{n} \mathbb{E}[\log Z(\sigma, G)]

where $Z = \sum_\sigma e^{H(\sigma, G)}$ and $H$ is the log-likelihood of the community assignment.

The replica calculation predicts the exact phase diagram of the SBM - both the KS threshold and the exact recovery threshold - and was later made rigorous using interpolation methods (Guerra, Talagrand).

ML connection: Free energy calculations in statistical physics are the rigorous foundation of variational inference in machine learning. The Bethe free energy (used in belief propagation) is the physics analog of the ELBO (evidence lower bound) in variational autoencoders.

End of 06-Random-Graphs notes.

Appendix N: Extended Exercise Solutions

N.1 Solution Notes for Exercise 1 (Phase Transition)

Key implementation details:

The fixed-point equation $\beta = 1 - e^{-c\beta}$ can be solved numerically via Newton's method:

\beta_{k+1} = \beta_k - \frac{\beta_k - 1 + e^{-c\beta_k}}{1 - ce^{-c\beta_k}}

Starting from $\beta_0 = 1 - e^{-c}$ (one step of Picard iteration), Newton's method converges in 5-10 iterations for $c \in [1, 5]$ .

For $c \le 1$ : the only fixed point is $\beta = 0$ (return 0). For $c > 1$ : there are two fixed points (0 and the positive root); return the positive root.

Expected results table:

$c$	Theoretical $\beta(c)$	Simulated $L_1/n$ (variance)
0.5	0	$O(\log n / n) \approx 0.004$
0.8	0	$O(\log n / n) \approx 0.006$
1.0	0 (but $n^{-1/3}$ scaling)	$\approx 0.15 \cdot n^{-1/3}$
1.5	0.583	$0.583 \pm 0.015$
2.0	0.797	$0.797 \pm 0.008$
2.5	0.892	$0.892 \pm 0.005$
3.0	0.940	$0.940 \pm 0.003$

Why $L_2$ peaks at criticality: The second-largest component size peaks at $c = 1$ (size $\sim n^{2/3}$ ) because this is where the giant component is just forming - there are many large components competing. For $c > 1$ , the giant component absorbs everything, leaving only $O(\log n)$ satellites.

N.2 Solution Notes for Exercise 4 (Community Detection)

Spectral algorithm implementation:

def spectral_community_detection(A, k=2):
    """Spectral clustering for SBM community detection."""
    n = A.shape[0]
    # Compute degree matrix and normalized Laplacian
    d = A.sum(axis=1)
    D_inv_sqrt = np.diag(1.0 / np.sqrt(d + 1e-10))
    L_sym = np.eye(n) - D_inv_sqrt @ A @ D_inv_sqrt
    
    # Eigendecomposition (smallest eigenvalues = community structure)
    eigvals, eigvecs = np.linalg.eigh(L_sym)
    
    # Second eigenvector (first non-trivial)
    u2 = eigvecs[:, 1]
    
    # Threshold at 0
    labels = (u2 > 0).astype(int)
    return labels

def community_accuracy(labels_est, labels_true):
    """Fraction correct, up to global flip."""
    acc1 = np.mean(labels_est == labels_true)
    acc2 = np.mean(labels_est == (1 - labels_true))
    return max(acc1, acc2)

Expected accuracy table:

$(a, b)$	SNR = $(a-b)^2/(2(a+b))$	Regime	Expected accuracy
$(20, 5)$	$225/50 = 4.5$	Well above KS	0.95-0.99
$(10, 5)$	$25/30 = 0.83$	Below KS	$\approx 0.5$ (random)
$(6, 4)$	$4/20 = 0.2$	Below KS	$\approx 0.5$ (random)
$(8, 3)$	$25/22 = 1.14$	Just above KS	0.55-0.65

N.3 Solution Notes for Exercise 8 (Preferential Attachment)

Efficient alias sampling for preferential attachment:

The naive approach (linear scan to sample proportional to degree) is $O(n)$ per edge, giving $O(n^2)$ total. For large $n$ , use the alias method:

Maintain a list of $(d_v, v)$ pairs for all existing nodes
Use the alias method to sample from this distribution in $O(1)$ per sample
After adding new edges, update the alias table in $O(m)$ per step

Alternatively, use the Vose-Walker alias method which supports updates efficiently.

Simpler $O(n \log n)$ approach: Maintain a binary indexed tree (Fenwick tree) over cumulative degrees. Sampling is $O(\log n)$ per edge, updating is $O(\log n)$ per edge, giving $O(nm \log n)$ total.

Power law fit: Maximum likelihood estimator for power-law exponent on data $k_1, \ldots, k_m$ with $k_i \ge k_{\min}$ :

\hat{\gamma} = 1 + m \left[\sum_{i=1}^m \ln\frac{k_i}{k_{\min} - 0.5}\right]^{-1}

For BA with $m = 2$ and $k_{\min} = 10$ : expect $\hat{\gamma} \approx 3.0 \pm 0.2$ for $n = 10000$ .

Appendix O: Further Reading

O.1 Textbooks

Bollobas, B. (2001). Random Graphs (2nd ed.). Cambridge University Press.
- The definitive mathematical treatment; covers ER theory exhaustively.
Janson, S., Luczak, T., Rucinski, A. (2000). Random Graphs. Wiley.
- More accessible than Bollobas; covers first and second moment methods thoroughly.
Lovasz, L. (2012). Large Networks and Graph Limits. AMS.
- The definitive treatment of graphon theory by its creator.
Durrett, R. (2007). Random Graph Dynamics. Cambridge University Press.
- Dynamical random graphs and their applications to epidemics and social networks.

O.2 Foundational Papers

Erdos, P., Renyi, A. (1960). On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutato Int. Kozl., 5, 17-61.
Watts, D.J., Strogatz, S.H. (1998). Collective dynamics of 'small-world' networks. Nature, 393, 440-442.
Barabasi, A.-L., Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509-512.
Lovasz, L., Szegedy, B. (2006). Limits of dense graph sequences. J. Combin. Theory Ser. B, 96, 933-957.
Mossel, E., Neeman, J., Sly, A. (2015). Reconstruction and estimation in the planted partition model. Probab. Theory Related Fields, 162, 431-461.
Abbe, E., Sandon, C. (2015). Community detection in the stochastic block models by spectral methods. arXiv:1503.00609.

O.3 ML-Specific Papers

Palowitch, J. et al. (2022). GraphWorld: Fake Graphs Bring Real Insights for GNNs. KDD 2022.
Keriven, N., Peyre, G. (2019). Universal Invariant and Equivariant Graph Neural Networks. NeurIPS 2019.
Vignac, C. et al. (2022). DiGress: Discrete Denoising diffusion for graph generation. ICLR 2023.
Rusch, T.K., Bronstein, M.M., Mishra, S. (2023). A survey on oversmoothing in graph neural networks. arXiv:2303.10993.
Broido, A.D., Clauset, A. (2019). Scale-free networks are rare. Nature Communications, 10, 1017.

<- Back to Graph Theory | Next: Graph Algorithms ->

Appendix P: Summary of Key Inequalities

P.1 Concentration Inequalities for Random Graphs

Markov's inequality: For non-negative $X$ :

\mathbb{P}[X \ge t] \le \frac{\mathbb{E}[X]}{t}

Chebyshev's inequality:

\mathbb{P}[|X - \mathbb{E}[X]| \ge t] \le \frac{\text{Var}(X)}{t^2}

Chernoff bound (Poisson random variables): For $X \sim \text{Poisson}(\mu)$ :

\mathbb{P}[X \ge (1+\delta)\mu] \le e^{-\mu \delta^2 / (2+\delta)}

\mathbb{P}[X \le (1-\delta)\mu] \le e^{-\mu \delta^2 / 2}

Paley-Zygmund inequality: For non-negative $X$ with finite second moment:

\mathbb{P}[X > 0] \ge \frac{(\mathbb{E}[X])^2}{\mathbb{E}[X^2]}

Azuma-Hoeffding (Martingale concentration): If $(X_0, X_1, \ldots, X_m)$ is a martingale with $|X_k - X_{k-1}| \le c_k$ :

\mathbb{P}[|X_m - X_0| \ge t] \le 2\exp\left(-\frac{t^2}{2\sum_k c_k^2}\right)

For random graph properties: reveal edges one at a time. Each edge revelation changes the property value by at most $c_k$ (Lipschitz constant of the property). Azuma-Hoeffding gives concentration of order $\sqrt{m} = O(n)$ around the mean.

McDiarmid's inequality (Bounded differences): If $f(G)$ changes by at most $c$ when one edge is added/removed:

\mathbb{P}[|f(G) - \mathbb{E}[f(G)]| \ge t] \le 2\exp\left(-\frac{t^2}{2\binom{n}{2}c^2}\right)

Applications: triangle count ( $c = O(n)$ per edge change), largest clique ( $c = 1$ ), chromatic number ( $c = 1$ ).

P.2 Spectral Inequalities

Perron-Frobenius: For non-negative irreducible $A$ , $\lambda_1(A) > 0$ and the leading eigenvector has all positive entries.

Courant-Fischer (Min-Max): The $k$ -th eigenvalue of symmetric $A$ satisfies:

\lambda_k(A) = \min_{\dim V = n-k+1} \max_{\mathbf{x} \in V, \|\mathbf{x}\|=1} \mathbf{x}^\top A \mathbf{x}

Weyl's theorem: For symmetric $A$ and $E$ :

|\lambda_k(A+E) - \lambda_k(A)| \le \|E\|_{op} \quad \forall k

Bauer-Fike: If $A$ is diagonalizable with condition number $\kappa$ , eigenvalue perturbations are bounded by $\kappa \|E\|$ . For symmetric $A$ : $\kappa = 1$ (always), recovering Weyl's theorem.

Cheeger inequality: For graph Laplacian $L$ , the Cheeger constant $h(G) = \min_{S: |S| \le n/2} |E(S, \bar{S})| / |S|$ satisfies:

\frac{\lambda_2}{2} \le h(G) \le \sqrt{2\lambda_2}

This is the fundamental connection between algebraic (spectral gap) and geometric (cut quality) properties of graphs.