Private notes
0/8000

Notes stay private to your browser until account sync is configured.

Part 1
26 min read18 headingsSplit lesson page

Lesson overview | Lesson overview | Next part

Graph Basics: Part 1: Intuition to 4. Paths, Walks, and Cycles

1. Intuition

1.1 What Is a Graph?

At its most elemental, a graph is a collection of things and connections between things. The "things" are called vertices (or nodes), and the connections are called edges (or links). This definition is deliberately vague about what the things are - and that is exactly the point. The same mathematical framework that describes friendships between people also describes bonds between atoms, links between web pages, and data flow between operations in a neural network.

Consider a simple social network:

SOCIAL NETWORK AS A GRAPH
========================================================================

  Alice ------- Bob              Vertices: {Alice, Bob, Carol, Dave}
    |          / |               Edges:    {Alice-Bob, Alice-Carol,
    |        /   |                          Bob-Carol, Bob-Dave}
    |      /     |
  Carol -/     Dave              4 vertices, 4 edges

========================================================================

The graph abstracts away everything except the relational structure: who knows whom. It doesn't matter whether Alice is 25 or 65, whether she lives in Tokyo or Toronto. The graph captures only the connectivity pattern.

For AI: This abstraction is precisely why graphs are so powerful in machine learning. A graph neural network processes the structure of relationships, not the raw pixel data or text. When DeepMind's AlphaFold predicts protein structure, it represents amino acid residues as vertices and spatial proximity as edges - the graph structure encodes the physics that determines how the protein folds.

1.2 Why Graphs Matter for AI

Graphs appear in AI in three fundamentally different roles:

1. Graphs as input data. Many real-world datasets are naturally graph-structured: social networks, citation networks, molecular structures, protein interaction networks, scene graphs in computer vision, abstract syntax trees in code analysis. Graph Neural Networks (GNNs) process these directly.

2. Graphs as computational structure. Every neural network defines a directed acyclic graph (DAG) - the computation graph. Vertices are operations (matmul, add, activation), edges carry tensors. Backpropagation is reverse topological traversal of this DAG. PyTorch's autograd and JAX's tracing both build and traverse computation graphs.

3. Graphs as knowledge representation. Knowledge graphs (Wikidata: 100B+ triples, Google Knowledge Graph) represent entities as vertices and relations as labeled edges. Retrieval-augmented generation (RAG) systems traverse these graphs to ground LLM responses in factual knowledge.

THREE ROLES OF GRAPHS IN AI
========================================================================

  +-----------------+   +-----------------+   +-----------------+
  |  GRAPHS AS DATA  |   | GRAPHS AS COMP. |   | GRAPHS AS       |
  |                  |   |   STRUCTURE     |   |  KNOWLEDGE      |
  +------------------+   +-----------------+   +-----------------+
  | Molecules        |   | Autograd DAGs   |   | Wikidata        |
  | Social networks  |   | Model arch.     |   | Google KG       |
  | Citation graphs  |   | Dataflow graphs  |   | ConceptNet      |
  | Scene graphs     |   | Compiler IR     |   | RAG retrieval   |
  +------------------+   +-----------------+   +-----------------+
       GNNs process           Backprop              LLMs query
       these directly          traverses              these for
                               these                  grounding

========================================================================

1.3 Graphs Are Everywhere

To build intuition for the diversity of graphs, consider these concrete examples:

Chemistry: A molecule is a graph where atoms are vertices and chemical bonds are edges. Benzene (C6H6C_6H_6) is a cycle graph C6C_6 with alternating single and double bonds. Drug discovery uses GNNs to predict molecular properties from graph structure (Gilmer et al., 2017).

Natural Language Processing: A dependency parse tree is a directed graph where words are vertices and grammatical relationships are edges. The sentence "The cat sat on the mat" produces a tree rooted at "sat" with edges to "cat" (subject) and "on" (preposition). Abstract Meaning Representations (AMRs) are DAGs used in semantic parsing.

Social Networks: Facebook's social graph has ~3 billion vertices (users) and hundreds of billions of edges (friendships). The degree distribution follows a power law - most users have few friends, a few "hub" users have thousands.

The Internet: The World Wide Web is a directed graph where pages are vertices and hyperlinks are edges. Google's PageRank algorithm (Page et al., 1999) - the foundation of web search - computes the dominant eigenvector of a modified adjacency matrix.

Biology: Protein-protein interaction networks, gene regulatory networks, neural connectomes (the wiring diagram of a brain) - all are graphs. The human connectome has ~86 billion vertices (neurons) and ~100 trillion edges (synapses).

1.4 The Language of Connections

Graph theory provides a precise vocabulary for describing structural properties that arise across all these domains:

  • How many connections does each object have? -> Degree
  • Can we reach any object from any other? -> Connectivity
  • What is the shortest route between two objects? -> Distance, diameter
  • Are there natural groupings? -> Components, communities
  • Can we split objects into two non-interacting groups? -> Bipartiteness
  • Is there redundancy in connections? -> Cycles
  • Are two networks "the same" structurally? -> Isomorphism

Each of these questions has a rigorous mathematical formalization that we develop in this section. And each has direct consequences for AI: the degree distribution determines how information spreads through a GNN, connectivity determines whether a computation graph can propagate gradients, and the Weisfeiler-Leman isomorphism test is provably equivalent to the expressive power of message-passing GNNs (Xu et al., 2019).

1.5 Historical Timeline

YearMilestoneSignificance
1736Euler solves Konigsberg bridge problemBirth of graph theory - first proof that no Eulerian circuit exists
1852Guthrie poses the four color conjectureSparked 124 years of research; proved computationally in 1976
1878Cayley enumerates treesFirst systematic study of tree structures
1936Konig publishes first graph theory textbookFormalised bipartite matching, Konig's theorem
1959Erdos and Renyi introduce random graphsFoundation of probabilistic graph theory
1962Dijkstra publishes shortest-path algorithmStill used in network routing today
1970sNP-completeness results (graph coloring, Hamiltonian cycle)Showed fundamental limits of graph computation
1976Appel and Haken prove the four color theoremFirst major theorem proved by computer
1998Watts and Strogatz model small-world networksExplained "six degrees of separation"
1999Page et al. publish PageRankGraph eigenvector as the basis of web search
2005Semantic Web and knowledge graphs emergeGraphs as structured knowledge for AI
2017Kipf and Welling publish GCNSpectral graph convolutions for node classification
2018Velickovic et al. publish GATAttention mechanisms on graphs
2019Xu et al. prove GNN-WL equivalence1-WL test bounds message-passing GNN expressiveness
2024Graph Transformers (GPS, Graphormer)Combining graph structure with transformer attention
2026Heterogeneous graph models at scaleMulti-relational graphs for knowledge-augmented LLMs

1.6 What This Section Covers vs. What Comes Later

This section (01) is the vocabulary and core theory of graph theory. It defines what graphs are, introduces the fundamental properties (degree, paths, connectivity, special families, isomorphism, coloring), and provides the mathematical language used throughout the rest of the chapter.

What comes next builds on this vocabulary:

  • 02 Graph Representations - How to store graphs in memory (adjacency matrix, adjacency list, CSR format)
  • 03 Graph Algorithms - Algorithms that compute the properties defined here (BFS, DFS, shortest paths, MST)
  • 04 Spectral Graph Theory - The eigenvalues of graph matrices reveal structural properties
  • 05 Graph Neural Networks - Neural architectures that learn from graph structure
  • 06 Random Graphs - Probabilistic models of graph generation

2. Formal Definitions

2.1 The Graph G=(V,E)G = (V, E)

Definition (Undirected Graph). An undirected graph is an ordered pair G=(V,E)G = (V, E) where:

  • VV is a finite, non-empty set called the vertex set (or node set)
  • E(V2)E \subseteq \binom{V}{2} is a set of unordered pairs of distinct vertices, called the edge set

Here (V2)={{u,v}:u,vV,uv}\binom{V}{2} = \{\{u, v\} : u, v \in V, u \neq v\} denotes the set of all 2-element subsets of VV.

We write n=Vn = |V| for the order of GG (number of vertices) and m=Em = |E| for the size of GG (number of edges).

Notation. An edge between vertices uu and vv is written as {u,v}\{u, v\}, uvuv, or (u,v)(u, v) depending on context. In this section we use {u,v}\{u, v\} for undirected edges and (u,v)(u, v) for directed edges.

Example 1: The triangle graph K3K_3.

V={1,2,3},E={{1,2},{1,3},{2,3}}V = \{1, 2, 3\}, \quad E = \{\{1,2\}, \{1,3\}, \{2,3\}\}
THE TRIANGLE GRAPH K_3
========================================================================

      1                 n = 3 (order)
     / \                m = 3 (size)
    /   \               Every pair connected -> "complete graph"
   2 --- 3

========================================================================

Example 2: The path graph P4P_4.

V={1,2,3,4},E={{1,2},{2,3},{3,4}}V = \{1, 2, 3, 4\}, \quad E = \{\{1,2\}, \{2,3\}, \{3,4\}\}
   1 -- 2 -- 3 -- 4          n = 4, m = 3

Example 3: The cycle graph C5C_5.

V={1,2,3,4,5},E={{1,2},{2,3},{3,4},{4,5},{5,1}}V = \{1, 2, 3, 4, 5\}, \quad E = \{\{1,2\}, \{2,3\}, \{3,4\}, \{4,5\}, \{5,1\}\}

For AI: When a GNN processes a molecular graph, each atom is a vertex in VV and each chemical bond is an edge in EE. The formal definition ensures that the graph is well-defined regardless of how atoms are labeled - only the connection pattern matters.

Key terminology summary:

TermSymbolMeaning
Vertex (node)vVv \in VAn object in the graph
Edge (link)eEe \in EA connection between two vertices
Ordern=Vn = \lvert V \rvertNumber of vertices
Sizem=Em = \lvert E \rvertNumber of edges
Adjacentuvu \sim vVertices u,vu, v connected by an edge
Incidentvev \in eVertex vv is an endpoint of edge ee
NeighbouruN(v)u \in \mathcal{N}(v)uu is adjacent to vv
NeighbourhoodN(v)\mathcal{N}(v)Set of all neighbours of vv
Endpointu,vu, v of {u,v}\{u,v\}The two vertices of an edge

Maximum number of edges. A simple undirected graph on nn vertices has at most (n2)=n(n1)2\binom{n}{2} = \frac{n(n-1)}{2} edges (achieved by KnK_n). A simple directed graph has at most n(n1)n(n-1) arcs. A graph with mm close to (n2)\binom{n}{2} is called dense; a graph with m=O(n)m = O(n) is called sparse. Most real-world graphs are sparse.

2.2 Directed vs. Undirected Graphs

Definition (Directed Graph / Digraph). A directed graph (or digraph) is an ordered pair G=(V,E)G = (V, E) where EV×VE \subseteq V \times V is a set of ordered pairs of vertices. An edge (u,v)(u, v) is called an arc directed from uu (the tail) to vv (the head).

The key distinction: in an undirected graph, {u,v}={v,u}\{u, v\} = \{v, u\} - the edge has no direction. In a digraph, (u,v)(v,u)(u, v) \neq (v, u) - the arc goes from uu to vv, not the reverse.

UNDIRECTED vs. DIRECTED
========================================================================

  Undirected:              Directed:
   A --- B                  A ----> B
   |     |                  up       |
   |     |                  |       down
   D --- C                  D <---- C

  {A,B} = {B,A}            (A,B) \neq (B,A)
  "A knows B"              "A follows B"
  Symmetric relation        Asymmetric relation

========================================================================

Example (Undirected): A social network where friendship is mutual - if Alice is friends with Bob, then Bob is friends with Alice. The edge set is symmetric.

Example (Directed): Twitter's follow graph - Alice can follow Bob without Bob following Alice. Citation networks - paper A cites paper B, but B does not cite A.

Example (Directed, AI-critical): A computation graph in PyTorch. Each operation node receives input tensors along incoming arcs and produces output tensors along outgoing arcs. The direction encodes dataflow: gradients propagate in the reverse direction during backpropagation.

Non-example: A "graph" where edges connect a vertex to itself (self-loop) - this violates the uvu \neq v requirement for simple graphs. We address self-loops in 2.4.

2.3 Weighted Graphs

Definition (Weighted Graph). A weighted graph is a triple G=(V,E,w)G = (V, E, w) where (V,E)(V, E) is a graph and w:ERw: E \to \mathbb{R} is a weight function assigning a real number to each edge.

Common weight interpretations:

  • Distance/cost: In a road network, w({u,v})w(\{u,v\}) is the distance between cities uu and vv
  • Similarity/affinity: In a kk-NN graph, w({u,v})=exp(xuxv2/2σ2)w(\{u,v\}) = \exp(-\lVert \mathbf{x}_u - \mathbf{x}_v \rVert^2 / 2\sigma^2) (Gaussian kernel)
  • Capacity: In a flow network, w((u,v))w((u,v)) is the maximum flow along the arc
  • Probability: In a Markov chain, w((u,v))=P(transition uv)w((u,v)) = P(\text{transition } u \to v)

For AI: Attention weights in a transformer define a weighted directed graph over token positions. The attention matrix softmax(QK/dk)\operatorname{softmax}(QK^\top / \sqrt{d_k}) is precisely the weight matrix of a weighted digraph where w((i,j))w((i,j)) is the attention that position ii pays to position jj. Graph Attention Networks (GAT, Velickovic et al., 2018) learn these weights as a function of node features.

Non-example (not a weighted graph): A matrix ARn×nA \in \mathbb{R}^{n \times n} with negative entries - this can be interpreted as a signed weighted graph, but the standard weighted graph definition allows w:ERw: E \to \mathbb{R} including negative weights. The distinction matters: negative weights break shortest-path algorithms like Dijkstra but are handled by Bellman-Ford.

2.4 Simple Graphs, Multigraphs, and Pseudographs

The formal definition in 2.1 defines a simple graph: no self-loops and no parallel edges. Real-world graphs often violate these constraints:

Definition (Self-loop). A self-loop is an edge from a vertex to itself: {v,v}\{v, v\} or (v,v)(v, v).

Definition (Multigraph). A multigraph allows multiple edges between the same pair of vertices. Formally, EE is a multiset rather than a set.

Definition (Pseudograph). A pseudograph allows both self-loops and multiple edges.

Graph TypeSelf-loopsParallel EdgesExample
Simple graphNoNoSocial network (at most one friendship)
MultigraphNoYesTransportation network (multiple bus routes)
PseudographYesYesState machine (self-transitions)

For AI: Most GNN frameworks assume simple graphs. Self-loops are added explicitly in GCN: A~=A+In\tilde{A} = A + I_n adds a self-loop to every vertex, ensuring that each node's own features are included in the aggregation step. This is a deliberate design choice, not a property of the input graph.

Non-example: A relation "is the same as" on a set - every element is related to itself, producing self-loops on every vertex. This is a pseudograph, not a simple graph.

2.5 Standard Examples and Non-Examples

Example 1: The Petersen graph. A famous graph with 10 vertices and 15 edges, 3-regular (every vertex has degree 3). It is a counterexample to many graph theory conjectures - it is not Hamiltonian, not edge-4-colorable, and not planar.

Example 2: The Karate Club graph. Zachary's karate club (1977) - 34 members, 78 edges representing social interactions. The graph split into two communities when the club divided, making it a standard benchmark for community detection. This is the "MNIST of graph learning."

Example 3: A molecule. Water (H2OH_2O) has 3 vertices (O, H, H) and 2 edges (O-H bonds). Caffeine (C8H10N4O2C_8H_{10}N_4O_2) has 24 vertices and 25 edges.

Non-example 1: A set without edges. The pair ({1,2,3},)(\{1,2,3\}, \emptyset) is technically a valid graph (the "empty graph" on 3 vertices), but it has no edges - no relationships are captured.

Non-example 2: A "graph" with edges between non-existent vertices. If V={1,2}V = \{1,2\} and E={{1,3}}E = \{\{1,3\}\}, this is not a valid graph because vertex 3 is not in VV. Edges must connect vertices in the vertex set.

Non-example 3: An ordered list. The sequence [1,2,3,4][1, 2, 3, 4] is not a graph - it has no explicit edge set. However, we can construct a path graph from it: V={1,2,3,4}V = \{1,2,3,4\}, E={{1,2},{2,3},{3,4}}E = \{\{1,2\}, \{2,3\}, \{3,4\}\}.

2.6 The Null Graph, Empty Graph, and Trivial Graph

These edge cases clarify the boundaries of the definition:

Definition (Trivial Graph). The graph with exactly one vertex and no edges: G=({v},)G = (\{v\}, \emptyset). This is the simplest possible graph.

Definition (Empty Graph). A graph G=(V,)G = (V, \emptyset) with n1n \geq 1 vertices and no edges. Also called an edgeless graph or independent set. Denoted Kˉn\bar{K}_n (the complement of the complete graph).

Definition (Null Graph). The graph with no vertices and no edges: G=(,)G = (\emptyset, \emptyset). Some authors exclude this, requiring VV \neq \emptyset. In this curriculum, we require V1|V| \geq 1.

For AI: The empty graph Kˉn\bar{K}_n appears when constructing GNN inputs: before adding edges (via kk-NN or radius graphs), the initial graph has nn nodes with features but no connections. The GCN self-loop trick A~=A+I\tilde{A} = A + I on an empty graph gives A~=I\tilde{A} = I - each node only aggregates its own features, equivalent to a standard MLP.


3. Degree and Degree Sequences

3.1 Vertex Degree

Definition (Degree). For an undirected graph G=(V,E)G = (V, E), the degree of a vertex vVv \in V is the number of edges incident to vv:

deg(v)={eE:ve}\deg(v) = |\{e \in E : v \in e\}|

Equivalently, deg(v)\deg(v) is the number of neighbours of vv: the vertices adjacent to vv.

Definition (In-degree and Out-degree). For a directed graph G=(V,E)G = (V, E):

deg+(v)={(v,u)E}(out-degree: edges leaving v)\deg^+(v) = |\{(v, u) \in E\}| \quad \text{(out-degree: edges leaving } v\text{)} deg(v)={(u,v)E}(in-degree: edges entering v)\deg^-(v) = |\{(u, v) \in E\}| \quad \text{(in-degree: edges entering } v\text{)}

The total degree of a vertex in a digraph is deg(v)=deg+(v)+deg(v)\deg(v) = \deg^+(v) + \deg^-(v).

DEGREE EXAMPLES
========================================================================

  Undirected:                    Directed:
      A --- B --- C                 A ---> B ---> C
      |           |                 up           |
      D ---------- E                D <--------- E

  deg(A) = 2  (B, D)           deg^+(A) = 1, deg^-(A) = 1
  deg(B) = 2  (A, C)           deg^+(B) = 1, deg^-(B) = 1
  deg(C) = 2  (B, E)           deg^+(C) = 0, deg^-(C) = 1  (sink)
  deg(D) = 2  (A, E)           deg^+(D) = 1, deg^-(D) = 1
  deg(E) = 2  (C, D)           deg^+(E) = 1, deg^-(E) = 1

========================================================================

For AI: In a GNN, a vertex's degree determines how many messages it receives during message passing. Vertices with very high degree (hubs) aggregate information from many neighbours, while vertices with degree 1 (leaves) receive information from a single source. This is why GCN normalises by 1/deg(u)deg(v)1/\sqrt{\deg(u)\deg(v)} - to prevent high-degree nodes from dominating the aggregation.

Notation (Adjacency matrix connection). If AA is the adjacency matrix of an undirected graph, then deg(vi)=j=1nAij\deg(v_i) = \sum_{j=1}^n A_{ij} - the row sum. The degree matrix D=diag(deg(v1),,deg(vn))D = \operatorname{diag}(\deg(v_1), \ldots, \deg(v_n)) is the diagonal matrix of degrees. Both AA and DD are central to spectral graph theory (04).

3.2 The Handshaking Lemma

Theorem (Handshaking Lemma). For any undirected graph G=(V,E)G = (V, E):

vVdeg(v)=2E\sum_{v \in V} \deg(v) = 2|E|

Proof. Each edge {u,v}E\{u, v\} \in E contributes exactly 1 to deg(u)\deg(u) and exactly 1 to deg(v)\deg(v). Thus each edge contributes exactly 2 to the total degree sum. Summing over all edges gives vVdeg(v)=2E\sum_{v \in V} \deg(v) = 2|E|. \square

Corollary. The number of vertices with odd degree is always even. (If there were an odd number of odd-degree vertices, the total degree sum would be odd, contradicting 2E2|E| being even.)

Directed version. For a digraph: vVdeg+(v)=vVdeg(v)=E\sum_{v \in V} \deg^+(v) = \sum_{v \in V} \deg^-(v) = |E|. Every arc has exactly one tail and one head.

Example. The Petersen graph: 10 vertices, each of degree 3. Total degree sum =10×3=30=2×15= 10 \times 3 = 30 = 2 \times 15 edges. \checkmark

Example. The star graph S5S_5 (one center connected to 4 leaves): deg(center)=4\deg(\text{center}) = 4, deg(leafi)=1\deg(\text{leaf}_i) = 1 for i=1,,4i = 1, \ldots, 4. Sum =4+1+1+1+1=8=2×4= 4 + 1 + 1 + 1 + 1 = 8 = 2 \times 4 edges. \checkmark

Application: Average degree. The average degree of a graph is dˉ=1nvVdeg(v)=2mn\bar{d} = \frac{1}{n}\sum_{v \in V}\deg(v) = \frac{2m}{n}. For sparse graphs (m=O(n)m = O(n)), the average degree is O(1)O(1). For dense graphs (m=Θ(n2)m = \Theta(n^2)), the average degree is Θ(n)\Theta(n).

GraphnnmmAverage degree 2m/n2m/n
KnK_nnnn(n1)/2n(n-1)/2n1n - 1
CnC_nnnnn22
Tree on nn verticesnnn1n-122/n22 - 2/n \approx 2
Km,nK_{m,n}m+nm+nmnmn2mn/(m+n)2mn/(m+n)

For AI: The handshaking lemma provides a quick sanity check when constructing graphs for GNN input. If you compute degrees from an edge list and the sum is odd, your graph construction has a bug. In large-scale graph datasets (OGB, ogbg-molhiv), this is a standard validation step. The average degree also determines the computational cost of one GNN layer: each message-passing step processes O(m)=O(ndˉ/2)O(m) = O(n\bar{d}/2) edges.

3.3 Degree Sequences

Definition (Degree Sequence). The degree sequence of a graph GG is the list of vertex degrees arranged in non-increasing order:

d1d2dnd_1 \geq d_2 \geq \cdots \geq d_n

Example. The complete bipartite graph K2,3K_{2,3}: vertices on one side have degree 3, vertices on the other have degree 2. Degree sequence: (3,3,2,2,2)(3, 3, 2, 2, 2).

Definition (Graphic Sequence). A non-increasing sequence of non-negative integers (d1,,dn)(d_1, \ldots, d_n) is graphic if there exists a simple graph with that degree sequence.

Theorem (Erdos-Gallai, 1960). A non-increasing sequence (d1,d2,,dn)(d_1, d_2, \ldots, d_n) of non-negative integers with even sum is graphic if and only if for each k{1,2,,n}k \in \{1, 2, \ldots, n\}:

i=1kdik(k1)+i=k+1nmin(di,k)\sum_{i=1}^k d_i \leq k(k-1) + \sum_{i=k+1}^n \min(d_i, k)

Example. Is (3,3,2,2,2)(3, 3, 2, 2, 2) graphic? Sum =12= 12 (even). Check k=1k = 1: 30+min(3,1)+min(2,1)+min(2,1)+min(2,1)=0+1+1+1+1=43 \leq 0 + \min(3,1) + \min(2,1) + \min(2,1) + \min(2,1) = 0 + 1+1+1+1 = 4. Yes. (Full check confirms graphicality - it is the degree sequence of K2,3K_{2,3}.)

Non-example. Is (3,3,3,1)(3, 3, 3, 1) graphic? Sum =10= 10 (even), n=4n = 4. Check k=1k = 1: 30+min(3,1)+min(3,1)+min(1,1)=33 \leq 0 + \min(3,1) + \min(3,1) + \min(1,1) = 3. Equality holds. Check k=3k = 3: 3+3+3=932+min(1,3)=6+1=73+3+3 = 9 \leq 3 \cdot 2 + \min(1,3) = 6 + 1 = 7. Fails! Not graphic - you cannot build a simple graph on 4 vertices where three vertices have degree 3 and one has degree 1.

3.4 Regular Graphs

Definition (kk-Regular Graph). A graph is kk-regular if every vertex has degree kk: deg(v)=k\deg(v) = k for all vVv \in V.

Examples:

  • 0-regular: The empty graph Kˉn\bar{K}_n (no edges)
  • 1-regular: A perfect matching (disjoint edges covering all vertices; requires even nn)
  • 2-regular: A disjoint union of cycles
  • 3-regular (cubic): The Petersen graph, the complete bipartite graph K3,3K_{3,3}
  • (n1)(n-1)-regular: The complete graph KnK_n (every vertex connected to every other)

Proposition. A kk-regular graph on nn vertices has exactly m=kn/2m = kn/2 edges (by the handshaking lemma). This requires knkn to be even.

For AI: Regular graphs are the "uniform" case in GNN analysis. On a kk-regular graph, the GCN normalisation factor 1/deg(u)deg(v)=1/k1/\sqrt{\deg(u)\deg(v)} = 1/k is constant, so GCN reduces to simple averaging of neighbour features. This makes regular graphs the easiest case for theoretical GNN analysis (e.g., proving the WL-equivalence result of Xu et al., 2019).

3.5 Degree Distributions in Real-World Graphs

Real-world graphs rarely have uniform degree distributions. Two patterns dominate:

Poisson degree distribution. In Erdos-Renyi random graphs G(n,p)G(n, p), the degree of each vertex is approximately Binomial(n1,p)Poisson(λ)\operatorname{Binomial}(n-1, p) \approx \operatorname{Poisson}(\lambda) where λ=(n1)p\lambda = (n-1)p. Most vertices have degree close to λ\lambda.

Power-law degree distribution. In many real-world networks (social, citation, web), the fraction of vertices with degree kk follows:

P(deg=k)kγ,γ(2,3) typicallyP(\deg = k) \propto k^{-\gamma}, \quad \gamma \in (2, 3) \text{ typically}

This is called a scale-free distribution. It means a few "hub" vertices have enormously high degree while most vertices have low degree.

NetworkTypeTypical γ\gamma
WWW (out-degree)Scale-freeγ2.1\gamma \approx 2.1
Citation networksScale-freeγ3.0\gamma \approx 3.0
Protein interactionsScale-freeγ2.4\gamma \approx 2.4
Erdos-RenyiPoissonN/A
Road networksNear-uniformN/A (deg34\deg \approx 3{-}4)

For AI: Power-law degree distributions create challenges for GNNs. Hub vertices with degree >1000>1000 dominate mini-batch sampling (GraphSAGE, Hamilton et al., 2017) and can cause representation collapse - the "over-squashing" problem. Solutions include degree-based sampling, virtual nodes, and graph transformers that bypass local message passing.

Clustering coefficient. Beyond degree, an important local structural measure is the clustering coefficient, which quantifies how clustered a vertex's neighbourhood is.

Definition (Local Clustering Coefficient). For a vertex vv with deg(v)2\deg(v) \geq 2:

C(v)=number of edges among neighbours of v(deg(v)2)C(v) = \frac{\text{number of edges among neighbours of } v}{\binom{\deg(v)}{2}}

This is the fraction of possible edges among vv's neighbours that actually exist. C(v)=1C(v) = 1 means the neighbourhood is a clique; C(v)=0C(v) = 0 means no two neighbours are connected.

Definition (Global Clustering Coefficient). The average over all vertices: C=1nvC(v)C = \frac{1}{n}\sum_{v} C(v).

Alternative: Transitivity. T(G)=3×number of trianglesnumber of connected triples=tr(A3)1A21tr(A2)T(G) = \frac{3 \times \text{number of triangles}}{\text{number of connected triples}} = \frac{\operatorname{tr}(A^3)}{\mathbf{1}^\top A^2 \mathbf{1} - \operatorname{tr}(A^2)}.

Network typeTypical CCInterpretation
Erdos-Renyi G(n,p)G(n,p)p\approx pLow, unclustered
Social networks0.10.1-0.50.5High, clustered (friends of friends are friends)
Protein interactions0.10.1-0.30.3Medium clustering
Regular latticeHighVery clustered but large diameter
Small-world (Watts-Strogatz)HighHigh clustering AND small diameter

For AI: The clustering coefficient is a standard graph-level feature in GNN benchmarks. High clustering indicates the presence of many triangles - triangle-counting is a canonical task that standard 1-WL GNNs cannot perform exactly (triangles require 3-WL). This motivates higher-order GNN architectures that explicitly count triangular motifs.

Note: The full theory of degree distributions and random graph models is developed in 06 Random Graphs. Here we introduce the concept; there we formalise it.


4. Paths, Walks, and Cycles

4.1 Walks, Trails, and Paths

These three concepts form a hierarchy of increasing restriction on vertex/edge repetition:

Definition (Walk). A walk of length kk in GG is a sequence of vertices v0,v1,,vkv_0, v_1, \ldots, v_k such that {vi1,vi}E\{v_{i-1}, v_i\} \in E for each i{1,,k}i \in \{1, \ldots, k\}. Vertices and edges may repeat.

Definition (Trail). A trail is a walk in which no edge is repeated (but vertices may repeat).

Definition (Path). A path is a walk in which no vertex is repeated (and hence no edge is repeated).

WALK vs. TRAIL vs. PATH
========================================================================

  Graph:   A --- B --- C
           |     |     |
           D --- E --- F

  Walk:    A -> B -> E -> B -> C          (B visited twice - OK for walk)
  Trail:   A -> B -> E -> D -> A -> B -> C  (no edge repeated, A visited twice)
  Path:    A -> B -> E -> F -> C           (no vertex repeated)

  Restriction:   Walk \supseteq Trail \supseteq Path
                 (every path is a trail, every trail is a walk)

========================================================================

Notation. The length of a walk/trail/path is the number of edges traversed (not vertices). A path from uu to vv is called a uu-vv path.

Proposition. If there exists a walk from uu to vv, then there exists a path from uu to vv. (Proof: remove cycles from the walk to obtain a path.)

For AI: In a GNN with kk message-passing layers, information from vertex uu can reach vertex vv if and only if there exists a walk of length k\leq k from uu to vv. The GNN's receptive field at depth kk is exactly the set of vertices reachable by walks of length k\leq k. Note: it is walks (not paths) because GNNs can re-visit vertices through different neighbours at different layers.

4.2 Cycles

Definition (Cycle). A cycle of length kk (also called a kk-cycle) is a closed walk v0,v1,,vkv_0, v_1, \ldots, v_k where v0=vkv_0 = v_k, all other vertices are distinct, and k3k \geq 3.

Definition (Girth). The girth of a graph is the length of its shortest cycle. If the graph has no cycles (is acyclic), the girth is defined as \infty.

Examples:

  • K3K_3 (triangle): girth =3= 3
  • K4K_4: girth =3= 3 (contains many triangles)
  • CnC_n (cycle graph): girth =n= n
  • Any tree: girth == \infty (acyclic)
  • Petersen graph: girth =5= 5 (smallest cycle has 5 vertices)

Definition (Acyclic Graph). A graph with no cycles. An undirected acyclic graph is a forest; a connected forest is a tree.

For AI: Cycles in computation graphs would create infinite loops during forward pass evaluation. This is why computation graphs are DAGs (directed acyclic graphs) - acyclicity ensures a valid topological ordering for sequential evaluation. Recurrent Neural Networks (RNNs) appear to have cycles, but when "unrolled" over time steps, the computation graph is a DAG.

4.3 Shortest Paths and Distance

Definition (Distance). The distance d(u,v)d(u, v) between vertices uu and vv is the length of the shortest uu-vv path. If no such path exists (they are in different components), d(u,v)=d(u, v) = \infty.

Definition (Eccentricity). The eccentricity of a vertex vv is ε(v)=maxuVd(v,u)\varepsilon(v) = \max_{u \in V} d(v, u) - the distance to the farthest vertex.

Definition (Diameter). The diameter of a graph is diam(G)=maxvVε(v)=maxu,vVd(u,v)\operatorname{diam}(G) = \max_{v \in V} \varepsilon(v) = \max_{u, v \in V} d(u, v) - the maximum distance between any two vertices.

Definition (Radius). The radius of a graph is rad(G)=minvVε(v)\operatorname{rad}(G) = \min_{v \in V} \varepsilon(v).

Definition (Center). The center of a graph is the set of vertices with minimum eccentricity: {v:ε(v)=rad(G)}\{v : \varepsilon(v) = \operatorname{rad}(G)\}.

Examples:

  • Path graph PnP_n: diameter =n1= n - 1, radius =(n1)/2= \lfloor(n-1)/2\rfloor
  • Cycle CnC_n: diameter =n/2= \lfloor n/2 \rfloor
  • Complete graph KnK_n: diameter =1= 1 (every pair is adjacent)
  • Star graph SnS_n (one center connected to n1n-1 leaves): diameter =2= 2

Proposition. For any connected graph: rad(G)diam(G)2rad(G)\operatorname{rad}(G) \leq \operatorname{diam}(G) \leq 2 \cdot \operatorname{rad}(G).

Proof sketch. The left inequality is immediate: the max over all vertices is \geq the min. For the right inequality: let u,vu, v achieve the diameter and let cc be a center vertex (achieving the radius). Then d(u,v)d(u,c)+d(c,v)ε(c)+ε(c)=2rad(G)d(u,v) \leq d(u,c) + d(c,v) \leq \varepsilon(c) + \varepsilon(c) = 2 \cdot \operatorname{rad}(G). \square

Distance properties (metric space). The graph distance function d:V×VN{0}d: V \times V \to \mathbb{N} \cup \{0\} is a metric:

  1. Non-negativity: d(u,v)0d(u,v) \geq 0, with d(u,v)=0    u=vd(u,v) = 0 \iff u = v
  2. Symmetry: d(u,v)=d(v,u)d(u,v) = d(v,u) (for undirected graphs)
  3. Triangle inequality: d(u,w)d(u,v)+d(v,w)d(u,w) \leq d(u,v) + d(v,w)

This means the vertex set of any connected graph forms a metric space under graph distance. This observation connects graph theory to metric geometry and is used in graph embedding methods (e.g., embedding graph vertices into Euclidean space while approximately preserving distances).

Distance matrix. The distance matrix DNn×nD \in \mathbb{N}^{n \times n} has entries Dij=d(vi,vj)D_{ij} = d(v_i, v_j). This is distinct from the degree matrix (also sometimes denoted DD). The distance matrix contains the full pairwise distance information of the graph and is used as a positional encoding in graph transformers (Li et al., 2020).

DISTANCE MATRIX EXAMPLE (P_4: 1-2-3-4)
========================================================================

  Distance matrix:          Eccentricities:
       1  2  3  4           \epsilon(1) = 3  (farthest: vertex 4)
  1 [  0  1  2  3 ]         \epsilon(2) = 2  (farthest: vertex 4)
  2 [  1  0  1  2 ]         \epsilon(3) = 2  (farthest: vertex 1)
  3 [  2  1  0  1 ]         \epsilon(4) = 3  (farthest: vertex 1)
  4 [  3  2  1  0 ]
                            Diameter = 3, Radius = 2
                            Center = {2, 3}

========================================================================

For AI: The diameter of a graph determines the minimum number of GNN layers needed to propagate information between the two most distant vertices. If diam(G)=d\operatorname{diam}(G) = d, a GNN with fewer than dd layers cannot use the full graph structure for prediction. The "small-world" property - most real-world graphs have diam(G)=O(logn)\operatorname{diam}(G) = O(\log n) - explains why GNNs with 2-3 layers often suffice in practice.

Note: Algorithms for computing shortest paths (BFS for unweighted, Dijkstra for weighted, Bellman-Ford for negative weights) are developed in 03 Graph Algorithms.

4.4 Eulerian and Hamiltonian Paths

Definition (Eulerian Trail/Circuit). An Eulerian trail is a trail that visits every edge exactly once. An Eulerian circuit is a closed Eulerian trail (starts and ends at the same vertex).

Theorem (Euler, 1736). A connected graph has an Eulerian circuit if and only if every vertex has even degree. It has an Eulerian trail (but not a circuit) if and only if exactly two vertices have odd degree.

This was the first theorem of graph theory, proved by Euler to show that the Konigsberg bridge problem has no solution (the graph of Konigsberg had four vertices, all of odd degree).

THE KONIGSBERG BRIDGE PROBLEM
========================================================================

  The city of Konigsberg (now Kaliningrad) had 7 bridges connecting
  4 landmasses. Can you walk across each bridge exactly once?

  Landmasses as vertices, bridges as edges:

       A -------- B         deg(A) = 3  (odd)
      /|\        |         deg(B) = 5  (odd)
     / | \       |         deg(C) = 3  (odd)
    /  |  \      |         deg(D) = 3  (odd)
   C   |   D-----+
       |                   All degrees odd -> no Eulerian trail exists
       +-----------D       (Euler's theorem: need 0 or 2 odd-degree vertices)

========================================================================

Proof sketch (Euler's theorem - necessity). If a graph has an Eulerian circuit, every time the walk enters a vertex on one edge it must leave on a different edge. So edges at each vertex are paired: enter-leave, enter-leave, ... This means every vertex uses an even number of edges, so deg(v)\deg(v) is even. For an Eulerian trail (not circuit), the start and end vertices each use one unpaired edge, so exactly two vertices have odd degree. \square

Definition (Hamiltonian Path/Cycle). A Hamiltonian path is a path that visits every vertex exactly once. A Hamiltonian cycle is a cycle that visits every vertex exactly once.

Contrast. Euler: every edge once. Hamilton: every vertex once.

PropertyEulerianHamiltonian
VisitsEvery edge onceEvery vertex once
Existence testPolynomial (check degrees)NP-complete
KnK_nHas Eulerian circuit iff nn is oddAlways has Hamiltonian cycle (n3n \geq 3)
Km,nK_{m,n}Has Eulerian circuit iff m,nm, n both evenHas Hamiltonian cycle iff m=nm = n

For AI: The Travelling Salesman Problem (TSP) - finding the shortest Hamiltonian cycle in a weighted complete graph - is a canonical NP-hard combinatorial optimization problem. Neural combinatorial optimization (Vinyals et al., 2015; Kool et al., 2019) uses attention-based models to learn heuristic TSP solutions. The graph structure of TSP is KnK_n with distance weights.

4.5 Counting Walks with the Adjacency Matrix

Theorem. Let AA be the adjacency matrix of a graph GG. The entry (Ak)ij(A^k)_{ij} counts the number of walks of length kk from vertex ii to vertex jj.

Proof sketch. By induction on kk. Base case: A1=AA^1 = A, and Aij=1A_{ij} = 1 iff there is a walk of length 1 (an edge) from ii to jj. Inductive step: (Ak+1)ij=(Ak)iAj(A^{k+1})_{ij} = \sum_{\ell} (A^k)_{i\ell} \cdot A_{\ell j}. A walk of length k+1k+1 from ii to jj consists of a walk of length kk from ii to some \ell, followed by an edge from \ell to jj. Summing over all possible intermediate vertices \ell gives the matrix product. \square

Example. For the triangle K3K_3 with adjacency matrix:

A=(011101110),A2=(211121112)A = \begin{pmatrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{pmatrix}, \quad A^2 = \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix}

(A2)11=2(A^2)_{11} = 2: there are 2 walks of length 2 from vertex 1 to itself (1211 \to 2 \to 1 and 1311 \to 3 \to 1). The diagonal entries of A2A^2 equal the vertex degrees.

Corollary. The number of triangles containing vertex ii is (A3)ii/2(A^3)_{ii} / 2 (each triangle is counted twice, once in each direction). The total number of triangles in GG is tr(A3)/6\operatorname{tr}(A^3) / 6.

Example: walk counts in K3K_3.

A3=(233323332)A^3 = \begin{pmatrix} 2 & 3 & 3 \\ 3 & 2 & 3 \\ 3 & 3 & 2 \end{pmatrix}

(A3)11=2(A^3)_{11} = 2, so vertex 1 is in 2/2=12/2 = 1 triangle. tr(A3)=6\operatorname{tr}(A^3) = 6, so there are 6/6=16/6 = 1 triangle total. Correct for K3K_3.

Extended corollary: subgraph counting.

SubgraphFormula using AA
Edgestr(A2)/2=m\operatorname{tr}(A^2)/2 = m
Trianglestr(A3)/6\operatorname{tr}(A^3)/6
Walks of length kk (total)1Ak1\mathbf{1}^\top A^k \mathbf{1}
Paths of length kkHarder - requires inclusion-exclusion

Connection to GNN layers. After kk GNN layers, the aggregated representation of vertex ii contains information from all vertices reachable by walks of length k\leq k from ii. The contribution of vertex jj to vertex ii's representation after exactly kk layers is proportional to (Ak)ij(A^k)_{ij} (in the linear, non-normalised case). The normalised GCN layer uses A^=D1/2AD1/2\hat{A} = D^{-1/2}AD^{-1/2}, so the kk-layer GCN computes A^kXW\hat{A}^k X W - a matrix whose entries are walks of length kk normalised by vertex degrees.

For AI: This theorem connects graph theory directly to linear algebra. The spectral decomposition A=QΛQA = Q\Lambda Q^\top gives Ak=QΛkQA^k = Q\Lambda^k Q^\top, so the number of walks of length kk is controlled by the eigenvalues of AA. The dominant eigenvalue determines the exponential growth rate of walk counts - this is the mathematical foundation of PageRank (stationary distribution of a random walk = dominant eigenvector of the transition matrix). Full spectral analysis is in 04 Spectral Graph Theory.


Skill Check

Test this lesson

Answer 4 quick questions to lock in the lesson and feed your adaptive practice queue.

--
Score
0/4
Answered
Not attempted
Status
1

Which module does this lesson belong to?

2

Which section is covered in this lesson content?

3

Which term is most central to this lesson?

4

What is the best way to use this lesson for real learning?

Your answers save locally first, then sync when account storage is available.
Practice queue