Part 2

30 min read18 headingsSplit lesson page

Lesson overview | Previous part | Next part

Graph Basics: Part 5: Connectivity to 10. Preview: Representations, Algorithms, and Spectral Methods

5. Connectivity

5.1 Connected Graphs and Components

Definition (Connected Graph). An undirected graph $G$ is connected if for every pair of vertices $u, v \in V$ , there exists a path from $u$ to $v$ .

Definition (Connected Component). A connected component of $G$ is a maximal connected subgraph - a connected subgraph that is not a proper subgraph of any larger connected subgraph.

Every graph can be uniquely decomposed into its connected components. If $G$ has $c$ connected components, then $c = 1$ means $G$ is connected.

CONNECTED COMPONENTS
========================================================================

  Component 1:    Component 2:    Component 3:
  A --- B         E --- F         H
  |   /           |
  | /             G
  C --- D

  3 connected components
  Vertices: {A,B,C,D} \cup {E,F,G} \cup {H}

========================================================================

Proposition. A graph on $n$ vertices is connected if and only if it has at least $n - 1$ edges (necessary but not sufficient - trees achieve exactly $n - 1$ ).

Proposition (Component count from Laplacian). The number of connected components equals the multiplicity of eigenvalue 0 in the graph Laplacian $L = D - A$ . This is a deep result connecting connectivity (combinatorial) to spectrum (algebraic).

For AI: If a GNN's input graph has multiple connected components, information cannot flow between components through message passing. Each component is processed independently. In molecular graphs, a molecule with disconnected fragments (e.g., a salt like NaCl in solution) has multiple components - the GNN must handle each separately or use a virtual node to connect them.

5.2 Strongly and Weakly Connected Digraphs

For directed graphs, connectivity bifurcates into two notions:

Definition (Strongly Connected). A digraph is strongly connected if for every pair $u, v$ , there exists a directed path from $u$ to $v$ AND a directed path from $v$ to $u$ .

Definition (Weakly Connected). A digraph is weakly connected if the underlying undirected graph (obtained by ignoring edge directions) is connected.

Definition (Strongly Connected Component / SCC). A strongly connected component is a maximal strongly connected subgraph.

STRONG vs. WEAK CONNECTIVITY
========================================================================

  A ---> B ---> C ---> D        Weakly connected: YES (underlying
  up           |                                   undirected graph
  +-----------+                                   is connected)

  SCCs: {A, B, C} and {D}    Strongly connected: NO (no path D -> A)
  
  Within {A, B, C}:          A->B->C->A forms a directed cycle
                              (strongly connected)

========================================================================

Example: A citation network is weakly connected (following links in either direction, you can reach most papers) but not strongly connected (you cannot follow citations from a 2024 paper to a 2023 paper - citations only go backward in time).

Example (Condensation DAG). Given the SCC decomposition, we can form the condensation of a digraph: contract each SCC to a single vertex. The resulting graph is always a DAG (if it had a directed cycle, the SCCs involved would merge into one). This DAG reveals the macroscopic flow structure of the digraph.

Algorithms for SCCs. Tarjan's algorithm (1972) and Kosaraju's algorithm both find all SCCs in $O(n + m)$ time using DFS. These are covered in detail in 03 Graph Algorithms.

For AI: The SCC decomposition of a computation graph reveals which operations form feedback loops. In standard feedforward networks, each SCC is a single vertex (the graph is a DAG). In architectures with weight sharing (like RNNs unrolled over time), understanding the SCC structure helps with gradient flow analysis. The condensation DAG of a knowledge graph reveals the hierarchical structure of entity relationships.

5.3 Bridges and Articulation Points

Definition (Bridge). An edge $e$ is a bridge (or cut edge) if removing $e$ increases the number of connected components.

Definition (Articulation Point). A vertex $v$ is an articulation point (or cut vertex) if removing $v$ and all its incident edges increases the number of connected components.

Example: In the path graph $P_4 = 1{-}2{-}3{-}4$ , every edge is a bridge and vertices 2 and 3 are articulation points. Removing edge $\{2,3\}$ disconnects $\{1,2\}$ from $\{3,4\}$ .

Example (non-bridge): In the cycle $C_4 = 1{-}2{-}3{-}4{-}1$ , no edge is a bridge - removing any single edge leaves the graph connected (as a path). No vertex is an articulation point.

Proposition. An edge $e = \{u,v\}$ is a bridge if and only if $e$ is not contained in any cycle. Equivalently, $e$ is a bridge iff there is no alternative path from $u$ to $v$ .

Definition (Biconnected Graph). A connected graph with no articulation points is biconnected. Equivalently, between any two vertices there are at least two vertex-disjoint paths.

For AI: Bridges and articulation points identify structural vulnerabilities in networks. In a knowledge graph, an articulation point is an entity whose removal disconnects parts of the knowledge base. In network analysis for social media, bridges between communities are the edges that enable information flow between groups.

5.4 Vertex and Edge Connectivity

Definition (Vertex Connectivity). The vertex connectivity $\kappa(G)$ is the minimum number of vertices whose removal disconnects $G$ (or reduces it to a single vertex). A graph is $k$ -connected if $\kappa(G) \geq k$ .

Definition (Edge Connectivity). The edge connectivity $\lambda(G)$ is the minimum number of edges whose removal disconnects $G$ .

Theorem (Whitney, 1932). For any graph $G$ :

\kappa(G) \leq \lambda(G) \leq \delta(G)

where $\delta(G) = \min_{v \in V} \deg(v)$ is the minimum degree.

Theorem (Menger, 1927). The maximum number of vertex-disjoint $u$ - $v$ paths equals the minimum number of vertices whose removal disconnects $u$ from $v$ .

This is a max-flow min-cut result: the maximum "flow" of disjoint paths equals the minimum "cut" of vertices. The edge version: the maximum number of edge-disjoint $u$ - $v$ paths equals the minimum edge cut separating $u$ from $v$ .

Example of Whitney's inequality. Consider a graph that is a "book" - two triangles sharing one edge:

  A --- B         deg(A) = deg(B) = 3
 /|\   /|\        \delta(G) = 3  (minimum degree)
C | D-E | F       \lambda(G) = 2  (remove the two edges at the shared edge)
 \|/   \|/        \kappa(G) = 1  (remove the shared vertex D or E)
  +-------+

So $\kappa = 1 \leq \lambda = 2 \leq \delta = 3$ . The inequality is strict.

Achieving equality. For $k$ -regular graphs: if $G$ is $k$ -regular and $k$ -edge-connected, then $\kappa(G) = \lambda(G) = k$ . Complete graphs $K_n$ achieve $\kappa = \lambda = \delta = n-1$ (maximum possible).

For AI: Menger's theorem is the combinatorial version of the max-flow min-cut theorem, which is fundamental to network flow algorithms used in image segmentation (graph cuts), assignment problems, and scheduling. The edge connectivity $\lambda(G)$ measures graph robustness: how many links must fail before the network breaks. This is directly applicable to reliability analysis of distributed systems and communication networks.

5.5 Connectivity in AI

Computation graph connectivity. A computation graph must be connected (in the weakly-connected sense for digraphs) for gradients to flow from the loss to all parameters. If a parameter's computation is disconnected from the loss, its gradient is zero - it cannot be trained. PyTorch detects this and raises warnings.

Knowledge graph connectivity. The "reachability" of entities in a knowledge graph determines what a RAG system can retrieve. If the query entity and the answer entity are in different connected components, no amount of graph traversal will find the answer.

GNN receptive field. A GNN with $k$ layers can propagate information across paths of length $\leq k$ . If the graph diameter exceeds $k$ , some vertex pairs cannot exchange information. This is the "under-reaching" problem, complementary to the "over-smoothing" problem (too many layers cause all representations to converge).

Molecular connectivity. In molecular graphs, the number of connected components indicates whether the input is a single molecule or a mixture/complex. Graph-level prediction tasks (e.g., molecular property prediction) typically assume a single connected component.

6. Special Graph Families

6.1 Complete Graphs

Definition (Complete Graph $K_n$ ). The complete graph on $n$ vertices is the graph where every pair of distinct vertices is connected by an edge:

E = \binom{V}{2}, \quad |E| = \binom{n}{2} = \frac{n(n-1)}{2}

Properties of $K_n$ :

$(n-1)$ -regular (every vertex has degree $n-1$ )
Diameter $= 1$ (every pair is adjacent)
Number of edges: $\binom{n}{2}$
Number of triangles: $\binom{n}{3}$
Number of spanning trees: $n^{n-2}$ (Cayley's formula)
Chromatic number: $\chi(K_n) = n$
Hamiltonian for $n \geq 3$ , Eulerian for odd $n$

COMPLETE GRAPHS
========================================================================

  K_1    K_2      K_3        K_4           K_5
   -    ----    -         -----       -------
                |\        |\ /|      /|\   /|
                | \       | x |    /  | \/  |
                ----      |/ \|   -----------
                          -----       \|/
  m=0   m=1    m=3       m=6     m=10

========================================================================

For AI: The fully-connected attention mechanism in transformers computes attention weights over $K_n$ (where $n$ is the sequence length). Every token attends to every other token - this is exactly message passing on the complete graph. The $O(n^2)$ complexity of self-attention is the cost of processing $K_n$ .

6.2 Bipartite Graphs

Definition (Bipartite Graph). A graph $G = (V, E)$ is bipartite if $V$ can be partitioned into two disjoint sets $U$ and $W$ such that every edge connects a vertex in $U$ to a vertex in $W$ : $E \subseteq U \times W$ .

Theorem (Odd-Cycle Characterisation). A graph is bipartite if and only if it contains no odd-length cycle.

Proof sketch. ( $\Rightarrow$ ) If $G$ is bipartite with parts $U, W$ , any cycle must alternate between $U$ and $W$ , requiring even length. ( $\Leftarrow$ ) If $G$ has no odd cycle, pick any vertex $v$ , colour it by distance: $U = \{u : d(v,u) \text{ even}\}$ , $W = \{u : d(v,u) \text{ odd}\}$ . No edge connects two vertices at the same distance (that would create an odd cycle). $\square$

Definition (Complete Bipartite Graph $K_{m,n}$ ). The bipartite graph with parts of size $m$ and $n$ where every vertex in one part is connected to every vertex in the other. It has $m + n$ vertices and $mn$ edges.

BIPARTITE GRAPH
========================================================================

  General bipartite:        Complete bipartite K_3,_2:
  U: * --- o              U: *---o
     |   /                    |\ /|
     * /                      | x |
     |                        |/ \|
  W: o     o               W: *---o---*

  Partition: U (filled), W (open)

========================================================================

Examples:

User-item interaction graphs (recommender systems): users and items are two parts
Bipartite matching: job assignment (workers <-> tasks)
Dependency parse trees are bipartite when arcs alternate between heads and dependents

Non-example: $K_3$ (the triangle) is NOT bipartite - it contains an odd cycle of length 3.

Non-example: $C_5$ (the pentagon) is NOT bipartite - it is an odd cycle of length 5.

Testing bipartiteness. BFS provides an efficient $O(n + m)$ bipartiteness test: start BFS from any vertex, alternating colours between levels. If any edge connects two same-colour vertices, the graph is not bipartite (and that edge, combined with the BFS tree paths, gives an odd cycle certificate). This is covered in 03 Graph Algorithms.

Theorem (Konig, 1931). In a bipartite graph, the size of the maximum matching equals the size of the minimum vertex cover.

This is a foundational result in combinatorial optimisation. A matching is a set of edges with no shared vertices. A vertex cover is a set of vertices that includes at least one endpoint of every edge. Konig's theorem does NOT hold for non-bipartite graphs (e.g., $K_3$ has matching size 1 but minimum vertex cover size 2).

Bipartite adjacency matrix structure. The adjacency matrix of a bipartite graph with parts $U$ ( $m$ vertices) and $W$ ( $n$ vertices) has a block structure:

A = \begin{pmatrix} 0_m & B \\ B^\top & 0_n \end{pmatrix}

where $B \in \{0,1\}^{m \times n}$ is the biadjacency matrix. This block structure has important spectral consequences: the eigenvalues of $A$ are symmetric about 0 (if $\lambda$ is an eigenvalue, so is $-\lambda$ ).

For AI: Recommender systems model user-item interactions as bipartite graphs. Matrix factorisation for recommendation (the Netflix Prize approach) is equivalent to low-rank approximation of the biadjacency matrix $B$ . Knowledge graphs with subject-predicate-object triples can be viewed as bipartite between entities and relations. The block structure of the bipartite adjacency matrix is exploited in bipartite GNN architectures for recommendation (PinSage, Ying et al., 2018).

6.3 Trees and Forests

Definition (Tree). A tree is a connected acyclic graph.

Definition (Forest). A forest is an acyclic graph (not necessarily connected). Each connected component of a forest is a tree.

Theorem (Tree Equivalences). For a graph $G$ on $n$ vertices, the following are equivalent:

$G$ is a tree (connected and acyclic)
$G$ is connected and has exactly $n - 1$ edges
$G$ is acyclic and has exactly $n - 1$ edges
There is exactly one path between every pair of vertices
$G$ is connected, but removing any edge disconnects it (every edge is a bridge)
$G$ is acyclic, but adding any edge creates exactly one cycle

Proof (1 $\Leftrightarrow$ 2). ( $\Rightarrow$ ) A connected graph has $\geq n-1$ edges. If it had $\geq n$ edges, by the pigeonhole principle there would be a cycle (contradiction with acyclic). So exactly $n-1$ . ( $\Leftarrow$ ) A connected graph with $n-1$ edges has no "spare" edges, so no cycles. $\square$

Proof (1 $\Rightarrow$ 4). Suppose $G$ is a tree and there exist two distinct paths $P_1$ and $P_2$ from $u$ to $v$ . Since $P_1 \neq P_2$ , they diverge at some vertex $w$ and rejoin later at some vertex $x$ . The subpaths $w \to x$ via $P_1$ and $w \to x$ via $P_2$ form a cycle - contradicting acyclicity. So the path is unique. $\square$

Proof (4 $\Rightarrow$ 5). If there is exactly one path between every pair, then removing any edge $\{u,v\}$ destroys the unique $u$ - $v$ path, disconnecting $u$ from $v$ . So every edge is a bridge. $\square$

TREE EQUIVALENCES - VISUAL SUMMARY
========================================================================

  Connected + Acyclic        Exactly one       Connected, all
  = TREE                     path between      edges are bridges
    |                        any pair              |
    |    n-1 edges           |                     |
    v    (connected)         v                     v
  +------------------------------------------------------+
  |              ALL SIX CONDITIONS EQUIVALENT            |
  |       (any one implies all the others)                |
  +------------------------------------------------------+
    ^                        ^                     ^
    |    n-1 edges           |                     |
    |    (acyclic)      Adding any edge       Maximal acyclic
    |                   creates exactly       = minimal connected
                        one cycle

========================================================================

Theorem (Cayley's Formula, 1889). The number of labeled spanning trees of the complete graph $K_n$ is $n^{n-2}$ .

$n$	$K_n$	Spanning trees $n^{n-2}$
1	single vertex	$1$
2	single edge	$1$
3	triangle	$3$
4	$K_4$	$16$
5	$K_5$	$125$

Theorem (Kirchhoff's Matrix Tree Theorem). The number of spanning trees of any graph $G$ equals any cofactor of the Laplacian matrix $L = D - A$ . Specifically, it equals $\det(L')$ where $L'$ is $L$ with any one row and column deleted.

Definition (Spanning Tree). A spanning tree of a connected graph $G$ is a tree that includes all vertices of $G$ . Every connected graph has at least one spanning tree.

Definition (Rooted Tree). A tree with a designated root vertex. Edges are implicitly directed away from (or toward) the root. Rooted trees define a parent-child hierarchy.

Examples:

Parse trees in NLP: dependency trees, constituency trees
File system directory structures
Decision trees in machine learning
Binary search trees, B-trees, tries

For AI: Trees are fundamental structures in AI:

Decision trees (CART, Random Forest, XGBoost) - each internal node is a split, each leaf is a prediction
Abstract syntax trees (ASTs) - code represented as trees for program analysis and code generation
Spanning trees of graphs - GNN approaches sometimes use tree decomposition for efficient message passing
Hierarchical clustering dendrograms - trees representing cluster merging

6.4 DAGs (Directed Acyclic Graphs)

Definition (DAG). A directed acyclic graph is a directed graph with no directed cycles.

Theorem (Topological Ordering). A directed graph has a topological ordering if and only if it is a DAG.

A topological ordering is a linear ordering of vertices such that for every directed edge $(u, v)$ , vertex $u$ comes before $v$ in the ordering. A DAG may have multiple valid topological orderings.

DAG AND TOPOLOGICAL ORDER
========================================================================

  DAG:                      Topological orderings:
  A ---> B ---> D             A, B, C, D, E
  |     |                   A, C, B, D, E
  down     down                   A, C, B, E, D   (if C->E exists)
  C ---> E

  Every directed path respects the ordering

========================================================================

For AI: DAGs are arguably the most important graph type in AI:

Computation graphs. Every feedforward neural network is a DAG. Forward evaluation proceeds in topological order; backpropagation proceeds in reverse topological order.
Bayesian networks. A Bayesian network is a DAG where vertices represent random variables and edges represent conditional dependencies. The joint distribution factorises according to the DAG structure: $p(x_1, \ldots, x_n) = \prod_i p(x_i \mid \text{parents}(x_i))$ .
Causal graphs. Structural causal models (Pearl, 2009) represent causal relationships as DAGs. The direction of edges encodes causal direction: $A \to B$ means " $A$ causes $B$ ."
Task scheduling. Build systems (Make, Bazel), data pipelines (Airflow), and training pipelines all model dependencies as DAGs.

Topological sort algorithm (Kahn, 1962):

KAHN'S TOPOLOGICAL SORT
========================================================================

  Input: DAG G = (V, E)
  1. Compute in-degree of every vertex
  2. Enqueue all vertices with in-degree 0
  3. While queue is non-empty:
       a. Dequeue vertex u, add to output
       b. For each successor v of u:
            reduce in-degree(v) by 1
            if in-degree(v) == 0: enqueue v
  4. If output has fewer than n vertices: G has a cycle (not a DAG)

  Time: O(n + m)     Space: O(n)

========================================================================

Unique topological ordering. A DAG has a unique topological ordering if and only if it has a Hamiltonian path - a path visiting all vertices. Equivalently, at every step of Kahn's algorithm, there is exactly one vertex with in-degree 0.

6.5 Planar Graphs

Definition (Planar Graph). A graph is planar if it can be drawn in the plane without edge crossings.

Theorem (Euler's Formula for Planar Graphs). For a connected planar graph with $n$ vertices, $m$ edges, and $f$ faces (including the outer face):

n - m + f = 2

Proof (by induction on $m$ ). Base case: a tree has $n-1$ edges and $f=1$ face (the outer face). So $n - (n-1) + 1 = 2$ . $\checkmark$ Inductive step: take a connected planar graph with a cycle. Removing an edge that is on a cycle reduces $m$ by 1 and merges two faces, reducing $f$ by 1. The quantity $n - m + f$ is unchanged. $\square$

Corollary. For a simple planar graph with $n \geq 3$ : $m \leq 3n - 6$ . This follows from Euler's formula plus the fact that each face is bounded by at least 3 edges, and each edge borders at most 2 faces.

Corollary (bipartite planar). For a simple bipartite planar graph with $n \geq 3$ : $m \leq 2n - 4$ . (Each face is bounded by $\geq 4$ edges, since bipartite graphs have no odd cycles.)

Theorem (Kuratowski, 1930). A graph is planar if and only if it does not contain a subdivision of $K_5$ or $K_{3,3}$ as a subgraph.

Examples:

$K_4$ is planar (can be drawn as a triangle with a vertex inside). Check: $n=4$ , $m=6$ , $f=4$ . $4 - 6 + 4 = 2$ . $\checkmark$
$K_5$ is NOT planar. Check: $n=5$ , $m=10$ , but $10 > 3(5)-6 = 9$ . Euler's bound violated.
$K_{3,3}$ is NOT planar. Check: $n=6$ , $m=9$ , bipartite bound $9 > 2(6)-4 = 8$ . Violated.

K_4 PLANAR EMBEDDING
========================================================================

  1                       Euler check:
  /|\                       n = 4 vertices
 / | \                      m = 6 edges
2--+--3                     f = 4 faces (3 triangles + outer)
 \ | /                      4 - 6 + 4 = 2  OK
  \|/
   4

========================================================================

For AI: Planar graphs arise in geographic information systems (road networks, administrative boundaries), VLSI circuit design (wires must not cross), and mesh-based physics simulations (finite element methods for weather prediction and fluid dynamics). Graph drawing algorithms that minimise edge crossings are used in knowledge graph visualisation. The planarity constraint is also useful as a sparsity prior - planar graphs are sparse ( $m \leq 3n-6$ ), making them computationally tractable for exact GNN inference.

6.6 Hypergraphs

Definition (Hypergraph). A hypergraph is a pair $H = (V, \mathcal{E})$ where $\mathcal{E} \subseteq 2^V$ is a set of hyperedges - each hyperedge is a subset of $V$ that may contain more than 2 vertices.

A standard graph is a hypergraph where every hyperedge has exactly 2 vertices.

Definition ( $k$ -uniform hypergraph). A hypergraph where every hyperedge has exactly $k$ vertices. Standard graphs are 2-uniform hypergraphs.

Example: In a co-authorship network, a paper with 5 authors creates a 5-element hyperedge connecting all 5 authors simultaneously - this captures the "group" relationship that pairwise edges cannot.

Bipartite representation. Any hypergraph $H = (V, \mathcal{E})$ can be represented as a bipartite graph $B = (V \cup \mathcal{E}, \{(v, e) : v \in e\})$ - one part for vertices, one for hyperedges, with edges representing membership. This is the incidence graph of the hypergraph and enables standard graph algorithms to operate on hypergraphs.

For AI: Hypergraphs naturally model higher-order interactions:

Attention mechanisms compute interactions over sets of tokens - multi-head attention can be viewed as a learned hypergraph where each attention head defines hyperedges
Higher-order message passing in Hypergraph Neural Networks (Feng et al., 2019) propagates messages along hyperedges
Set functions like DeepSets (Zaheer et al., 2017) process unordered sets - each set is a hyperedge

Note: Hypergraph theory is a deep and active research area. This section provides only an introduction. The key takeaway is that standard graphs capture pairwise relationships; hypergraphs capture group relationships.

7. Subgraphs and Graph Operations

7.1 Subgraphs and Induced Subgraphs

Definition (Subgraph). A graph $H = (V', E')$ is a subgraph of $G = (V, E)$ if $V' \subseteq V$ and $E' \subseteq E$ . We write $H \subseteq G$ .

Definition (Induced Subgraph). Given a vertex subset $S \subseteq V$ , the induced subgraph $G[S]$ is the subgraph with vertex set $S$ and all edges of $G$ whose both endpoints are in $S$ :

G[S] = (S, \{e \in E : e \subseteq S\})

The key difference: a subgraph can choose any subset of edges, but an induced subgraph must include ALL edges between vertices in $S$ .

Definition (Spanning Subgraph). A subgraph $H$ is spanning if $V' = V$ - it has the same vertex set as $G$ but possibly fewer edges.

SUBGRAPH vs. INDUCED SUBGRAPH
========================================================================

  Original G:           Subgraph H:         Induced G[{A,B,C}]:
  A --- B               A --- B             A --- B
  |\    |               |                   |\
  |  \  |               |                   |  \
  D --- C               D     C             C
                                             (includes edge A-C
  All edges present      Missing edges OK     because both endpoints
                                              are in {A,B,C})

========================================================================

For AI: Mini-batch training for GNNs works by sampling induced subgraphs. GraphSAGE (Hamilton et al., 2017) samples $k$ -hop neighbourhoods as induced subgraphs. The choice between subgraph and induced subgraph affects whether all edges within the sample are preserved - induced subgraphs preserve all local structure.

7.2 Graph Complement

Definition (Graph Complement). The complement $\bar{G}$ of a simple graph $G = (V, E)$ is the graph on the same vertex set with complementary edge set:

\bar{G} = (V, \binom{V}{2} \setminus E)

Two vertices are adjacent in $\bar{G}$ if and only if they are NOT adjacent in $G$ .

Properties:

$|E(\bar{G})| = \binom{n}{2} - |E(G)|$
$G$ and $\bar{G}$ together form $K_n$ : $E(G) \cup E(\bar{G}) = E(K_n)$
$\overline{\bar{G}} = G$ (complement of complement is the original)
$\overline{K_n} = \bar{K}_n$ (complement of complete graph is empty graph)
A graph is self-complementary if $G \cong \bar{G}$ (e.g., $P_4$ , $C_5$ )

For AI: Complement graphs appear in constraint satisfaction. If $G$ represents conflicts (vertices that cannot be assigned the same resource), then $\bar{G}$ represents compatibility. Finding a maximum clique in $G$ is equivalent to finding a maximum independent set in $\bar{G}$ .

7.3 Graph Union, Intersection, and Product

Definition (Graph Union). For graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$ :

G_1 \cup G_2 = (V_1 \cup V_2, E_1 \cup E_2)

Definition (Graph Intersection). $G_1 \cap G_2 = (V_1 \cap V_2, E_1 \cap E_2)$ .

Definition (Cartesian Product $G_1 \square G_2$ ). Vertex set $V_1 \times V_2$ . Two vertices $(u_1, u_2)$ and $(v_1, v_2)$ are adjacent if:

$u_1 = v_1$ and $\{u_2, v_2\} \in E_2$ , or
$u_2 = v_2$ and $\{u_1, v_1\} \in E_1$

Example: $P_2 \square P_3$ produces a $2 \times 3$ grid graph. The hypercube $Q_n = K_2 \square K_2 \square \cdots \square K_2$ ( $n$ times) has $2^n$ vertices and $n$ -bit binary strings as vertex labels.

CARTESIAN PRODUCT P_2 [] P_3  (2\times3 grid)
========================================================================

  P_2: A-B      P_3: 1-2-3

  Product vertices: (A,1), (A,2), (A,3), (B,1), (B,2), (B,3)

  (A,1)-(A,2)-(A,3)    <- copies of P_3 for each vertex of P_2
    |     |     |
  (B,1)-(B,2)-(B,3)    <- copies of P_2 for each vertex of P_3

  Result: 6 vertices, 7 edges (a 2\times3 grid)

========================================================================

Properties of graph products:

Product	Symbol	Adjacency condition	$\lvert V \rvert$	$\lvert E \rvert$
Cartesian	$G \square H$	Same in one coord, adjacent in other	$nm$	$m_H n + m_G m$
Tensor	$G \times H$	Adjacent in both coords	$nm$	$2m_G m_H$
Strong	$G \boxtimes H$	Cartesian OR tensor	$nm$	$m_H n + m_G m + 2m_G m_H$
Lexicographic	$G[H]$	Adjacent in $G$ , OR same in $G$ and adjacent in $H$	$nm$	$m_G n^2 + n_G m_H$

where $n = \lvert V_G \rvert$ , $m = \lvert V_H \rvert$ , $m_G = \lvert E_G \rvert$ , $m_H = \lvert E_H \rvert$ .

Definition (Tensor (Categorical) Product $G_1 \times G_2$ ). Vertex set $V_1 \times V_2$ . Two vertices $(u_1, u_2)$ and $(v_1, v_2)$ are adjacent if $\{u_1, v_1\} \in E_1$ AND $\{u_2, v_2\} \in E_2$ .

For AI: Graph products appear in constructing higher-order graph structures. The tensor product of a graph with itself encodes 2-walk patterns: $(A \otimes A)_{(i,j),(k,l)} = A_{ik} \cdot A_{jl}$ , counting simultaneous walks. In the theory of GNN expressiveness, the $k$ -WL test colours $k$ -tuples of vertices - equivalent to operating on the $k$ -fold tensor product $G^{\times k}$ . Grid graphs (Cartesian products of paths) arise in image processing (pixels as vertices) and in mesh-based physics simulations.

7.4 Graph Minor and Subdivision

Definition (Subdivision). A subdivision of a graph $G$ is obtained by inserting new vertices into edges - replacing edge $\{u,v\}$ with a path $u{-}w_1{-}w_2{-}\cdots{-}v$ .

Definition (Graph Minor). A graph $H$ is a minor of $G$ if $H$ can be obtained from $G$ by a sequence of vertex deletions, edge deletions, and edge contractions. Edge contraction merges two adjacent vertices into one.

Theorem (Robertson-Seymour Graph Minor Theorem, 2004). In any infinite sequence of finite graphs, one is a minor of another. Equivalently, every minor-closed graph property can be characterised by a finite set of forbidden minors.

This is one of the deepest results in graph theory, proved over 20 papers spanning 1983-2004. Kuratowski's planarity theorem is a special case: planarity is characterised by two forbidden minors ( $K_5$ and $K_{3,3}$ ).

For AI: Graph minor theory is primarily of theoretical interest in AI, but it provides the foundation for treewidth - a measure of how "tree-like" a graph is. Many NP-hard graph problems become polynomial-time on graphs of bounded treewidth, which is exploited in exact inference for probabilistic graphical models.

7.5 Line Graphs

Definition (Line Graph $L(G)$ ). The line graph $L(G)$ of a graph $G$ has one vertex for each edge of $G$ . Two vertices in $L(G)$ are adjacent if the corresponding edges in $G$ share an endpoint.

V(L(G)) = E(G), \quad \{e_1, e_2\} \in E(L(G)) \iff e_1 \cap e_2 \neq \emptyset

Example: If $G$ is the path $1{-}2{-}3{-}4$ with edges $a = \{1,2\}$ , $b = \{2,3\}$ , $c = \{3,4\}$ , then $L(G)$ has vertices $\{a,b,c\}$ with edges $\{a,b\}$ and $\{b,c\}$ - another path!

Properties:

$|V(L(G))| = |E(G)|$
$\deg_{L(G)}(e) = \deg_G(u) + \deg_G(v) - 2$ where $e = \{u,v\}$
$L(K_n)$ is the Kneser graph complement / triangular graph with $\binom{n}{2}$ vertices

For AI: Line graphs convert edge-level problems into vertex-level problems. Since most GNN frameworks operate on vertices, transforming a graph to its line graph enables edge-centric tasks (link prediction, edge classification) to be solved with standard vertex-classification GNNs.

8. Graph Isomorphism and Invariants

8.1 Graph Isomorphism

Definition (Graph Isomorphism). Two graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$ are isomorphic, written $G_1 \cong G_2$ , if there exists a bijection $\phi: V_1 \to V_2$ such that:

\{u, v\} \in E_1 \iff \{\phi(u), \phi(v)\} \in E_2

The bijection $\phi$ is called an isomorphism. Intuitively, two graphs are isomorphic if they have the same structure - the same connectivity pattern - just with different vertex labels.

GRAPH ISOMORPHISM
========================================================================

  G_1:             G_2:              Isomorphism \phi:
  A --- B         1 --- 3          A -> 1
  |     |         |     |          B -> 3
  |     |         |     |          C -> 4
  D --- C         2 --- 4          D -> 2

  Same structure (cycle C_4), different labels

========================================================================

Verifying an isomorphism - worked example.

G_1:           G_2:           Claimed isomorphism \phi:
A --- B        1 --- 3        A -> 1,  B -> 3
|     |        |     |        C -> 4,  D -> 2
|     |        |     |
D --- C        2 --- 4

Verify: for each edge in $G_1$ , check that its image is an edge in $G_2$ :

Edge in $G_1$	Image under $\phi$	Edge in $G_2$ ?
$\{A,B\}$	$\{\phi(A),\phi(B)\} = \{1,3\}$	Yes
$\{B,C\}$	$\{3,4\}$	Yes
$\{C,D\}$	$\{4,2\}$	Yes
$\{D,A\}$	$\{2,1\}$	Yes

All 4 edges map correctly and $\phi$ is a bijection. Therefore $G_1 \cong G_2$ . Both are $C_4$ .

Proving non-isomorphism. To show $G_1 \not\cong G_2$ , it suffices to find any graph invariant that differs. The quickest method: check the degree sequence. If degree sequences differ, the graphs cannot be isomorphic. If they agree, check further invariants (girth, number of triangles, spectrum).

Computational complexity. The graph isomorphism problem is one of the few natural problems in NP that is neither known to be in P nor known to be NP-complete. Babai (2016) showed it can be solved in quasi-polynomial time $\exp(O(\log^c n))$ , but no polynomial-time algorithm is known.

For AI: Graph isomorphism is central to GNN theory. A GNN that is invariant to vertex permutation produces the same output for isomorphic graphs. This is a design requirement: the molecular property of caffeine should not depend on how we number its atoms. Formally, if $f$ is a GNN and $\Pi$ is a permutation matrix, then $f(\Pi A \Pi^\top, \Pi X) = f(A, X)$ . Equivariance is a related concept: $f(\Pi A \Pi^\top, \Pi X) = \Pi f(A, X)$ . Invariance is needed for graph-level tasks; equivariance is needed for node-level tasks (the output at each node should transform consistently with input permutations).

8.2 Graph Invariants

A graph invariant is a property or quantity that is the same for all isomorphic graphs. Graph invariants provide necessary (but generally not sufficient) conditions for isomorphism.

Invariant	Description	Distinguishes
Number of vertices $n$	$\lvert V \rvert$	Graphs of different order
Number of edges $m$	$\lvert E \rvert$	Graphs with different size
Degree sequence	Sorted list of degrees	Many non-isomorphic pairs
Diameter	Max distance	Some pairs with same degree sequence
Girth	Shortest cycle length	Some pairs with same degree/diameter
Spectrum	Eigenvalues of $A$	Most pairs (but not all!)
Chromatic number	Min colors for proper coloring	Some pairs
Number of triangles	$\operatorname{tr}(A^3)/6$	Some pairs

Non-isomorphic graphs with the same spectrum (cospectral graphs): There exist non-isomorphic graphs with identical eigenvalue spectra. The smallest example has 6 vertices. This means the spectrum alone does not determine the graph up to isomorphism.

For AI: When a GNN computes a graph-level embedding, it is computing a learned graph invariant. The expressiveness question is: which non-isomorphic graphs can the GNN distinguish? This leads directly to the Weisfeiler-Leman connection.

8.3 The Weisfeiler-Leman Test

The Weisfeiler-Leman (WL) test is a classical algorithm for testing graph isomorphism. It is not always correct (it can fail to distinguish certain non-isomorphic graphs), but it is the theoretical benchmark for GNN expressiveness.

Algorithm (1-WL / Color Refinement):

Initialise: Assign every vertex the same colour $c_0(v) = 1$ .
Iterate: For each vertex, compute a new colour based on its current colour and the multiset of its neighbours' colours:

c_{t+1}(v) = \operatorname{HASH}\left(c_t(v), \{\!\{c_t(u) : u \in \mathcal{N}(v)\}\!\}\right)

Terminate: When the colour partition stabilises (no further refinement occurs).
Decide: If the multisets of colours differ between $G_1$ and $G_2$ , they are NOT isomorphic. If they are the same, the test is inconclusive.

Here $\{\!\{\cdot\}\!\}$ denotes a multiset (a set with multiplicities).

1-WL COLOR REFINEMENT EXAMPLE
========================================================================

  Graph G:                  Iteration 0:    Iteration 1:
  A --- B                   All vertices    A: (1, {1,1}) -> colour 2
  |     |                   colour 1        B: (1, {1,1}) -> colour 2
  |     |                                   C: (1, {1,1}) -> colour 2
  D --- C                                   D: (1, {1,1}) -> colour 2

  All vertices get the same colour -> 1-WL cannot distinguish
  vertices in a regular graph after iteration 1 (for C_4)

========================================================================

Theorem (Xu et al., 2019; Morris et al., 2019). Message-passing GNNs are AT MOST as powerful as the 1-WL test. That is, if 1-WL cannot distinguish two graphs, no message-passing GNN can distinguish them either.

Conversely, there exist GNN architectures (GIN - Graph Isomorphism Network, Xu et al., 2019) that are exactly as powerful as 1-WL.

The $k$ -WL hierarchy. The $k$ -dimensional WL test ( $k$ -WL) colours $k$ -tuples of vertices rather than individual vertices. Higher $k$ gives strictly more distinguishing power:

1\text{-WL} < 2\text{-WL} < 3\text{-WL} < \cdots

Higher-order GNNs (Morris et al., 2019) achieve $k$ -WL power but at $O(n^k)$ computational cost.

For AI: The WL-GNN equivalence is one of the most important theoretical results in graph learning. It tells us:

Message-passing GNNs have a fundamental expressiveness ceiling
Some graph structures (e.g., distinguishing regular graphs of the same degree) are provably beyond standard GNNs
To exceed 1-WL power, we need architectural innovations: higher-order message passing, random features, subgraph counting, or graph transformers

What 1-WL cannot distinguish - a concrete example.

Graph G_1 (two triangles sharing no vertex):    Graph G_2 (one 6-cycle):

  1 - 2       4 - 5                             1 - 2
  |   |       |   |                             |   |
  3 --+       6 --+                             6   3
                                                |   |
                                                5 - 4

Both graphs: 6 vertices, 6 edges, all degrees = 2 (2-regular). The 1-WL test assigns the same colour to all vertices in both graphs at every iteration - it cannot distinguish them. But they are NOT isomorphic: $G_1$ has two components ( $C_3 \cup C_3$ ) while $G_2$ is connected ( $C_6$ ). A standard message-passing GNN also cannot distinguish these two graphs!

The Graph Isomorphism Network (GIN, Xu et al., 2019). GIN achieves 1-WL expressiveness by using the aggregation:

\mathbf{h}_v^{(k)} = \operatorname{MLP}^{(k)}\!\left((1 + \varepsilon^{(k)}) \cdot \mathbf{h}_v^{(k-1)} + \sum_{u \in \mathcal{N}(v)} \mathbf{h}_u^{(k-1)}\right)

The sum aggregator (not mean or max) is critical - it preserves multiset information about neighbour counts, which is equivalent to the multiset hashing in 1-WL.

8.4 Graph Automorphisms

Definition (Automorphism). An automorphism of a graph $G$ is an isomorphism from $G$ to itself: a bijection $\phi: V \to V$ that preserves adjacency.

Definition (Automorphism Group). The set of all automorphisms of $G$ forms a group under composition, denoted $\operatorname{Aut}(G)$ .

Examples:

$\operatorname{Aut}(K_n) = S_n$ (the symmetric group - any permutation preserves complete adjacency)
$\operatorname{Aut}(C_n) = D_n$ (the dihedral group - rotations and reflections of the cycle)
$\operatorname{Aut}(P_n) = \mathbb{Z}_2$ for $n \geq 2$ (only the identity and reversal)

Definition (Orbit). The orbit of a vertex $v$ under $\operatorname{Aut}(G)$ is $\{\phi(v) : \phi \in \operatorname{Aut}(G)\}$ - the set of vertices that $v$ can be mapped to. Vertices in the same orbit are "structurally equivalent."

For AI: Automorphisms represent graph symmetries. A GNN that is permutation-equivariant respects all graph automorphisms: if two vertices are in the same orbit (structurally identical), the GNN assigns them the same representation (in the absence of distinguishing node features). This is why 1-WL (and standard GNNs) assign the same colour to all vertices in a regular graph - they are all in the same automorphism orbit.

8.5 Graph Motifs

Definition (Subgraph Motif). A graph motif is a small, recurring subgraph pattern that appears in a network significantly more often than in a random graph with the same degree sequence. Motifs are the structural "building blocks" of complex networks.

The canonical small motifs:

Triangle (K_3)     3-Path (P_3)      4-Cycle (C_4)     Star (S_3)
  1 -- 2             1 - 2 - 3         1 - 2             1
  |  / |                               |   |            /|\
  | /  |                               4 - 3           2 3 4
  3

Motif	Vertices	Edges	Significance
Triangle $K_3$	3	3	Social triadic closure; clustering coefficient
3-path $P_3$	3	2	Shortest paths; betweenness centrality
4-cycle $C_4$	4	4	Bipartite structure; absent in trees
Star $S_k$	$k+1$	$k$	Hub nodes; scale-free networks
Wedge (open triangle)	3	2	Potential triangle; closure probability
4-clique $K_4$	4	6	Dense community core

Triangle counting. The number of triangles in $G$ is:

\Delta(G) = \frac{1}{6}\operatorname{tr}(A^3)

since $(A^3)_{ii}$ counts closed walks of length 3 starting and ending at vertex $i$ , and each triangle is counted twice per vertex (clockwise and counterclockwise), hence the factor $1/6$ .

Local clustering coefficient. The clustering coefficient of vertex $v$ measures the fraction of $v$ 's neighbour pairs that are themselves connected:

C(v) = \frac{|\{e_{uw} : u,w \in N(v),\ e_{uw} \in E\}|}{\binom{\deg(v)}{2}}

When $\deg(v) \leq 1$ , define $C(v) = 0$ . The global clustering coefficient (transitivity) is:

C_{\text{global}} = \frac{3 \times \text{number of triangles}}{\text{number of connected triples}} = \frac{\operatorname{tr}(A^3)}{\sum_{i}(A^2)_{ii}(\deg(v_i)-1)/\ldots}

(A more convenient form: $C_{\text{global}} = \operatorname{tr}(A^3) / \mathbf{1}^\top (A^2 - \operatorname{diag}(A^2))\mathbf{1}$ .)

Motif frequency vector as a graph fingerprint. The counts of each motif type in a graph form a motif frequency vector $\mathbf{m}(G) \in \mathbb{Z}^r$ . These vectors:

Are invariant to vertex relabelling (graph invariants).
Capture structural information beyond degree sequences.
Can distinguish many non-isomorphic graphs that stumped simpler invariants.

For AI: Motif counting is directly connected to GNN expressiveness. Standard message-passing GNNs (1-WL power) can count stars around each node but cannot count triangles or 4-cycles in the local neighbourhood without additional architecture. This was a key finding in Bouritsas et al. (2022) "Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting", which proposed encoding motif counts as additional node features to boost GNN expressiveness beyond 1-WL. Similarly, the GraphSAGE and GAT architectures implicitly learn to weight neighbour aggregations, but are blind to whether those neighbours are mutually connected (triangle detection requires at least a 2-hop view).

Practical implication. When designing a GNN for tasks where triangles matter (social network communities, protein binding sites, knowledge graph triples), consider (a) adding pre-computed triangle counts as node features, (b) using a higher-order GNN (k-WL, $k \geq 2$ ), or (c) using a subgraph GNN that routes information through induced subgraphs.

9. Graph Coloring

9.1 Vertex Coloring and Chromatic Number

Definition (Proper Coloring). A proper vertex coloring of a graph $G$ is a function $c: V \to \{1, 2, \ldots, k\}$ such that $c(u) \neq c(v)$ whenever $\{u, v\} \in E$ . Adjacent vertices must receive different colours.

Definition (Chromatic Number). The chromatic number $\chi(G)$ is the minimum number of colours needed for a proper colouring of $G$ .

Computing $\chi(G)$ : Determining the chromatic number is NP-hard in general. For specific graph families:

Graph	$\chi(G)$	Reason
$K_n$	$n$	Every pair adjacent - all need different colours
$\bar{K}_n$ (empty)	$1$	No adjacencies - one colour suffices
$C_n$ ( $n$ even)	$2$	Bipartite - alternate two colours
$C_n$ ( $n$ odd)	$3$	Not bipartite - need one extra colour
Tree	$2$	Bipartite (no odd cycles)
Petersen graph	$3$	3-regular, girth 5
Planar graph	$\leq 4$	Four colour theorem

Greedy Coloring Algorithm. Process vertices in some order. Assign each vertex the smallest colour not used by any of its already-coloured neighbours. This uses at most $\Delta(G) + 1$ colours, where $\Delta(G) = \max_v \deg(v)$ .

GREEDY COLORING EXAMPLE (path 1-2-3-4-5)
========================================================================

  Process in order 1,2,3,4,5:
  Vertex 1: neighbours = {}        -> colour 1    (first available)
  Vertex 2: neighbours = {1:red}   -> colour 2    (red used)
  Vertex 3: neighbours = {2:blue}  -> colour 1    (blue used, red free)
  Vertex 4: neighbours = {3:red}   -> colour 2    (red used)
  Vertex 5: neighbours = {4:blue}  -> colour 1    (blue used, red free)

  Result: 2 colours (optimal - path is bipartite)

========================================================================

Lower bounds on $\chi(G)$ :

Clique number: $\chi(G) \geq \omega(G)$ where $\omega(G)$ is the size of the largest clique. (A clique requires all-different colours.)
Fractional chromatic number: $\chi(G) \geq \chi_f(G)$ where $\chi_f$ is a relaxation that can be computed in polynomial time.
Spectrum: $\chi(G) \geq 1 - \lambda_{\max}/\lambda_{\min}$ where $\lambda_{\max}, \lambda_{\min}$ are the largest and smallest eigenvalues of $A$ .

Theorem (Brooks, 1941). For any connected graph $G$ that is not a complete graph or an odd cycle: $\chi(G) \leq \Delta(G)$ .

For AI: Graph coloring is one of the foundational constraint satisfaction problems (CSPs). Many scheduling, timetabling, and resource allocation problems reduce to graph coloring: if two tasks conflict (share a resource), they must receive different "colours" (time slots). Solving CSPs with neural approaches (e.g., reinforcement learning for combinatorial optimization) often operates on the conflict graph. The clique-cover lower bound $\chi(G) \geq \omega(G)$ connects coloring to clique detection, which is used in feature selection (finding maximally correlated feature groups).

9.2 Chromatic Polynomial

Definition (Chromatic Polynomial). The chromatic polynomial $P(G, k)$ counts the number of proper colourings of $G$ using at most $k$ colours.

Theorem (Deletion-Contraction). For any edge $e = \{u,v\}$ :

P(G, k) = P(G - e, k) - P(G / e, k)

where $G - e$ is $G$ with edge $e$ deleted and $G / e$ is $G$ with edge $e$ contracted.

Examples:

$P(K_n, k) = k(k-1)(k-2)\cdots(k-n+1) = k^{(n)}$ (falling factorial)
$P(\bar{K}_n, k) = k^n$ (no constraints - each vertex independently chooses from $k$ colours)
$P(T, k) = k(k-1)^{n-1}$ for any tree $T$ on $n$ vertices

Note: $\chi(G) = \min\{k : P(G, k) > 0\}$ - the chromatic number is the smallest $k$ for which the chromatic polynomial is positive.

Worked example - computing $P(C_3, k)$ by deletion-contraction.

$C_3$ is the triangle graph with vertices $\{1,2,3\}$ and edges $e_{12}, e_{23}, e_{13}$ .

Step 1. Delete edge $e_{12}$ : the resulting graph $C_3 - e_{12}$ is the path $P_3$ (a tree on 3 vertices).

P(P_3, k) = k(k-1)^2

Step 2. Contract edge $e_{12}$ : vertices 1 and 2 merge into a single vertex $v^*$ , giving $K_2$ (a single edge $\{v^*, 3\}$ ).

P(K_2, k) = k(k-1)

Step 3. Apply deletion-contraction:

P(C_3, k) = P(P_3, k) - P(K_2, k) = k(k-1)^2 - k(k-1) = k(k-1)(k-2)

Verification. For $k = 3$ : $3 \cdot 2 \cdot 1 = 6$ proper 3-colourings of a triangle. Each of the $3! = 6$ permutations of colours gives a valid colouring, confirming the answer.

Verification. For $k = 2$ : $2 \cdot 1 \cdot 0 = 0$ . A triangle cannot be 2-coloured (it contains an odd cycle), consistent with $\chi(C_3) = 3$ .

Coefficients encode structure. The chromatic polynomial $P(G,k)$ is always a monic polynomial in $k$ of degree $n = |V|$ with integer coefficients. The coefficient of $k^{n-1}$ equals $-|E|$ , and the signs alternate. Reading off the polynomial immediately reveals: $\chi(G)$ (smallest positive root), the number of edges, and whether $G$ is bipartite (all positive coefficients $\iff$ bipartite).

Graph	$P(G, k)$	$\chi$
$K_1$	$k$	$1$
$K_2$	$k(k-1)$	$2$
$P_3$	$k(k-1)^2$	$2$
$C_3$	$k(k-1)(k-2)$	$3$
$C_4$	$(k-1)^4 + (k-1)$	$2$
$K_3$	$k(k-1)(k-2)$	$3$
$K_4$	$k(k-1)(k-2)(k-3)$	$4$

AI connection. Chromatic polynomials appear in probabilistic graphical models: the partition function of the zero-temperature Potts model on a graph equals $P(G, q)$ . Computing $P(G, k)$ is #P-hard in general, motivating approximate inference methods (belief propagation, variational methods) that are central to Bayesian deep learning.

9.3 The Four Color Theorem

Theorem (Appel and Haken, 1976). Every planar graph can be properly coloured with at most 4 colours: $\chi(G) \leq 4$ for all planar $G$ .

This theorem is historically significant for two reasons:

It was the first major mathematical theorem proved with computer assistance - the proof required checking ~1,500 configurations by computer.
It resolved a conjecture open since 1852 (124 years!).

The proof was controversial when published because mathematicians could not verify it by hand. A simplified computer-assisted proof was given by Robertson, Sanders, Seymour, and Thomas (1997), and the result has been formally verified in the Coq proof assistant (Gonthier, 2008).

Proof idea (Kempe chains, 1879 - flawed but instructive). Kempe attempted to prove the theorem using "Kempe chains" - maximal connected subgraphs in which only two colours appear. His proof had a flaw (found by Heawood in 1890), but Kempe chains remain a key tool in coloring proofs.

Five Color Theorem (Kempe/Heawood, 1890). Every planar graph is 5-colorable. This weaker result is provable by hand:

Proof sketch. By induction on $n$ . A planar graph on $n \geq 6$ vertices has a vertex $v$ of degree $\leq 5$ (since $m \leq 3n - 6$ implies average degree $< 6$ ). Remove $v$ , 5-color the rest by induction, then try to restore $v$ . If its $\leq 5$ neighbours use $\leq 4$ distinct colours, $v$ gets the 5th colour. The key case (5 neighbours using all 5 colours) is handled by a Kempe chain argument. $\square$

Note: 4 colours are sometimes necessary - $K_4$ is planar and needs 4 colours. But three colours suffice for most "practical" planar graphs (outerplanar graphs, for example, are always 3-colorable).

9.4 Coloring in AI

Graph coloring appears in AI and systems in several important contexts:

Register allocation in compilers. Variables that are simultaneously "live" conflict and cannot share a register. The conflict graph has variables as vertices and edges between simultaneously-live variables. Coloring this graph with $k$ colours assigns $k$ registers. The LLVM and GCC compilers use graph coloring for register allocation.

Frequency assignment. In wireless networks, adjacent cells cannot use the same frequency (interference). The interference graph's chromatic number determines the minimum number of frequencies needed.

Map colouring / geographic visualisation. The original motivation for the four-colour theorem: colouring regions of a map so adjacent regions have different colours. This is equivalent to colouring the dual graph (faces $\to$ vertices, shared borders $\to$ edges).

Constraint satisfaction. Many AI planning and scheduling problems reduce to graph coloring. Exam scheduling: students taking the same two courses cannot have those exams at the same time. Course conflict graph $\to$ minimum number of time slots $= \chi(G)$ .

GNN feature augmentation. Adding graph coloring information (e.g., the colour assigned by a greedy algorithm) as node features can improve GNN expressiveness beyond the 1-WL limit, because coloring breaks vertex symmetry.

10. Preview: Representations, Algorithms, and Spectral Methods

This section provides brief pointers to the remaining subsections in the Graph Theory chapter. Each topic listed below has its own dedicated section with full treatment.

10.1 Graph Representations (Preview)

Preview: Graph Representations

Graphs can be stored in memory using several data structures, each with different space-time trade-offs. The adjacency matrix $A \in \{0,1\}^{n \times n}$ enables matrix operations (spectral methods, GCN) but uses $O(n^2)$ space. The adjacency list uses $O(n + m)$ space and enables fast neighbour enumeration. CSR format (Compressed Sparse Row) combines the benefits of both for sparse graphs and is the standard in large-scale GNN frameworks.

-> Full treatment: Graph Representations

10.2 Graph Algorithms (Preview)

Preview: Graph Algorithms

Classical algorithms compute the properties defined in this section. BFS computes shortest paths in unweighted graphs and identifies connected components. DFS detects cycles, finds bridges, and computes strongly connected components. Dijkstra's algorithm finds shortest paths in weighted graphs. Minimum spanning tree algorithms (Prim, Kruskal) find the cheapest connected spanning subgraph.

-> Full treatment: Graph Algorithms

10.3 Spectral Graph Theory (Preview)

Preview: Spectral Graph Theory

The eigenvalues and eigenvectors of graph matrices (adjacency matrix $A$ , Laplacian $L = D - A$ , normalised Laplacian $L_{\text{sym}} = I - D^{-1/2}AD^{-1/2}$ ) reveal deep structural properties. The number of zero eigenvalues of $L$ equals the number of connected components. The Fiedler vector (second-smallest eigenvector of $L$ ) provides the optimal graph bipartition, forming the basis of spectral clustering. The Cheeger inequality connects the algebraic spectrum to combinatorial expansion.

-> Full treatment: Spectral Graph Theory

10.4 Graph Neural Networks (Preview)

Preview: Graph Neural Networks

GNNs learn vertex and graph-level representations by iterating a message-passing scheme: $\mathbf{h}_v^{(k+1)} = \text{UPDATE}(\mathbf{h}_v^{(k)}, \text{AGG}(\{\mathbf{h}_u^{(k)} : u \in \mathcal{N}(v)\}))$ . The Graph Convolutional Network (GCN, Kipf & Welling, 2017) uses the normalised adjacency matrix: $H^{(l+1)} = \sigma(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}H^{(l)}W^{(l)})$ . Graph Attention Networks (GAT) learn edge weights dynamically. The expressiveness of message-passing GNNs is bounded by the 1-WL test (8.3).

-> Full treatment: Graph Neural Networks

10.5 Random Graphs (Preview)

Preview: Random Graphs

Random graph models generate graphs probabilistically. The Erdos-Renyi model $G(n, p)$ includes each edge independently with probability $p$ , producing Poisson degree distributions. The Barabasi-Albert model generates scale-free graphs with power-law degree distributions via preferential attachment. The Watts-Strogatz model produces small-world graphs with high clustering and short diameters.

-> Full treatment: Random Graphs

Graph Basics: Part 2 - Connectivity To 10 Preview Representations Algorithms And Spectral M

Graph Basics: Part 5: Connectivity to 10. Preview: Representations, Algorithms, and Spectral Methods

5. Connectivity

5.1 Connected Graphs and Components

5.2 Strongly and Weakly Connected Digraphs

5.3 Bridges and Articulation Points

5.4 Vertex and Edge Connectivity

5.5 Connectivity in AI

6. Special Graph Families

6.1 Complete Graphs

6.2 Bipartite Graphs

6.3 Trees and Forests

6.4 DAGs (Directed Acyclic Graphs)

6.5 Planar Graphs

6.6 Hypergraphs

7. Subgraphs and Graph Operations

7.1 Subgraphs and Induced Subgraphs

7.2 Graph Complement

7.3 Graph Union, Intersection, and Product

7.4 Graph Minor and Subdivision

7.5 Line Graphs

8. Graph Isomorphism and Invariants

8.1 Graph Isomorphism

8.2 Graph Invariants

8.3 The Weisfeiler-Leman Test

8.4 Graph Automorphisms

8.5 Graph Motifs

9. Graph Coloring

9.1 Vertex Coloring and Chromatic Number

9.2 Chromatic Polynomial

9.3 The Four Color Theorem

9.4 Coloring in AI

10. Preview: Representations, Algorithms, and Spectral Methods

10.1 Graph Representations (Preview)

10.2 Graph Algorithms (Preview)

10.3 Spectral Graph Theory (Preview)

10.4 Graph Neural Networks (Preview)

10.5 Random Graphs (Preview)

Test this lesson

Which module does this lesson belong to?

Which section is covered in this lesson content?

Which term is most central to this lesson?

What is the best way to use this lesson for real learning?