Part 2

25 min read18 headingsSplit lesson page

Lesson overview | Previous part | Lesson overview

Graph Representations: Part 7: Incidence Matrix to Appendix B: Complexity Reference

7. Incidence Matrix

7.1 Definition and Properties

Definition (Incidence Matrix - Undirected). For a graph $G = (V, E)$ with $n$ vertices and $m$ edges, the incidence matrix $B \in \{0,1\}^{n \times m}$ has:

B_{ve} = \begin{cases} 1 & \text{if vertex } v \text{ is an endpoint of edge } e \\ 0 & \text{otherwise} \end{cases}

Rows are indexed by vertices, columns by edges. Each column has exactly two 1-entries (one per endpoint). Each row has as many 1-entries as the vertex's degree.

Worked example - path $P_4$ : Vertices $\{0,1,2,3\}$ , edges $e_1 = \{0,1\}$ , $e_2 = \{1,2\}$ , $e_3 = \{2,3\}$ .

B = \begin{pmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{pmatrix} \leftarrow \begin{array}{l} v_0 \\ v_1 \\ v_2 \\ v_3 \end{array}

Basic properties:

Each column sums to 2: $\mathbf{1}^\top B_{:,e} = 2$ for all edges $e$
Row sum of $v$ = $\deg(v)$ : $B_{v,:} \mathbf{1} = \deg(v)$
Space: $O(nm)$ - often worse than adjacency matrix for dense graphs, but informative for sparse hypergraphs

7.2 Directed Incidence Matrix

For a directed graph (digraph) $G = (V, E)$ with edges $(u \to v)$ :

B_{ve} = \begin{cases} +1 & \text{if } v \text{ is the head (target) of } e \\ -1 & \text{if } v \text{ is the tail (source) of } e \\ 0 & \text{otherwise} \end{cases}

The signed incidence matrix encodes edge direction. Each column still has exactly two non-zero entries: $+1$ at the head, $-1$ at the tail.

Curl and divergence on graphs. The directed incidence matrix $B$ plays the role of the discrete gradient operator:

$B^\top \mathbf{x}$ gives the "difference" $x_v - x_u$ across each directed edge $(u \to v)$ - a discrete gradient
$B \mathbf{f}$ gives the net flow divergence at each vertex - the flow in minus flow out

This structure underlies discrete Hodge theory and graph signal processing.

7.3 The Identity L = BB^^T

Theorem. For an undirected graph with (unsigned) incidence matrix $B$ :

L = B B^\top

Proof. The $(i,j)$ entry of $B B^\top$ is $(BB^\top)_{ij} = \sum_e B_{ie} B_{je}$ .

If $i = j$ : $\sum_e B_{ie}^2 = \deg(i)$ (since $B_{ie} \in \{0,1\}$ , this counts edges incident to $i$ ).
If $i \neq j$ and $\{i,j\} \in E$ : exactly one edge $e = \{i,j\}$ has $B_{ie} = B_{je} = 1$ , so $(BB^\top)_{ij} = 1$ .
If $i \neq j$ and $\{i,j\} \notin E$ : no edge contributes, so $(BB^\top)_{ij} = 0$ .

Thus $(BB^\top)_{ij} = D_{ij} - A_{ij} = L_{ij}$ . $\square$

Corollary: $L = BB^\top \succeq 0$ (since $\mathbf{x}^\top BB^\top \mathbf{x} = \lVert B^\top \mathbf{x} \rVert_2^2 \geq 0$ ).

For the directed incidence matrix $B_{\text{dir}}$ : $L = B_{\text{dir}} B_{\text{dir}}^\top$ still holds and the Laplacian is positive semidefinite.

For AI: The identity $L = BB^\top$ gives a factored representation of the Laplacian that is useful in:

Graph signal processing: High-frequency signals have large $\mathbf{x}^\top L \mathbf{x} = \lVert B^\top \mathbf{x} \rVert_2^2$
Optimisation: $\min_{\mathbf{x}} \mathbf{x}^\top L \mathbf{x}$ subject to constraints is the basis of Laplacian smoothing in GNNs
Network flow: $B\mathbf{f} = \mathbf{b}$ is the flow conservation constraint (Kirchhoff's current law)

7.4 Hypergraph Incidence Matrix

The incidence matrix generalises naturally to hypergraphs - graphs where edges (hyperedges) can connect more than two vertices.

Definition (Hypergraph Incidence Matrix). For a hypergraph $\mathcal{H} = (V, \mathcal{E})$ where each hyperedge $e \in \mathcal{E}$ is a subset of vertices:

H_{ve} = \begin{cases} 1 & \text{if } v \in e \\ 0 & \text{otherwise} \end{cases}

Each column has $|e|$ ones (the size of the hyperedge). The ordinary graph incidence matrix is the special case $|e| = 2$ for all edges.

For AI: Hypergraph representations arise in:

Group interactions: Social events involving $k \geq 2$ participants simultaneously
Multi-head attention: Each attention head defines a weighted hyperedge over all query-key pairs
Co-authorship networks: Each paper is a hyperedge connecting all its authors
Higher-order GNNs: $k$ -WL tests and $k$ -dimensional GNNs operate on $k$ -tuples of vertices, naturally represented as hyperedges

7.5 When to Use the Incidence Matrix

Use the incidence matrix when:

The primary operations are edge-based (flow, cut, cycle) rather than vertex-based
Working with hypergraphs (the incidence matrix is the natural structure)
Implementing Hodge decomposition or discrete exterior calculus
Teaching or deriving the Laplacian algebraically ( $L = BB^\top$ )

Avoid when:

$n$ or $m$ is large (space is $O(nm)$ , often worse than alternatives)
The primary need is neighbour enumeration or edge queries

8. Representation Conversions

8.1 Conversion Complexity Table

From -> To	Time	Space	Notes
Edge list -> Adjacency matrix	$O(n^2 + m)$	$O(n^2)$	Allocate matrix, fill entries
Edge list -> Adjacency list	$O(n + m)$	$O(n + m)$	Hash each edge to its endpoints
Edge list -> COO	$O(m)$	$O(m)$	Repack into parallel arrays
Edge list -> CSR	$O(m \log m)$	$O(n + m)$	Sort by source, prefix-sum
Adjacency matrix -> Edge list	$O(n^2)$	$O(m)$	Scan for non-zeros
Adjacency matrix -> CSR	$O(n^2)$	$O(n + m)$	Scan row by row
COO -> CSR	$O(m \log m)$	$O(n + m)$	Sort rows, prefix-sum
CSR -> COO	$O(m)$	$O(m)$	Expand row_ptr
CSR -> CSC	$O(m)$	$O(n + m)$	Transpose
CSR -> Adjacency list	$O(n + m)$	$O(n + m)$	One list per row

8.2 Edge List <-> Adjacency Matrix

def edge_list_to_matrix(edges, n, directed=False, weighted=False):
    """Convert edge list to numpy adjacency matrix. O(n^2 + m)."""
    A = np.zeros((n, n), dtype=float if weighted else int)
    for edge in edges:
        u, v = edge[0], edge[1]
        w = edge[2] if weighted and len(edge) > 2 else 1
        A[u, v] = w
        if not directed:
            A[v, u] = w
    return A

def matrix_to_edge_list(A, directed=False, weighted=False):
    """Convert adjacency matrix to edge list. O(n^2)."""
    n = len(A)
    edges = []
    for i in range(n):
        j_range = range(n) if directed else range(i+1, n)
        for j in j_range:
            if A[i, j] != 0:
                edges.append((i, j, A[i, j]) if weighted else (i, j))
    return edges

8.3 Edge List <-> Adjacency List

def edge_list_to_adj_list(edges, n, directed=False):
    """Convert edge list to adjacency list. O(n + m)."""
    adj = {i: [] for i in range(n)}
    for edge in edges:
        u, v = edge[0], edge[1]
        adj[u].append(v)
        if not directed:
            adj[v].append(u)
    return adj

def adj_list_to_edge_list(adj, directed=False):
    """Convert adjacency list to edge list. O(n + m)."""
    edges = set()
    for u, nbrs in adj.items():
        for v in nbrs:
            if directed or (v, u) not in edges:
                edges.add((u, v))
    return list(edges)

8.4 Adjacency Matrix <-> CSR

def matrix_to_csr(A):
    """Dense matrix to CSR. O(n^2) scan."""
    n = len(A)
    data, col_idx, row_ptr = [], [], [0]
    for i in range(n):
        for j in range(n):
            if A[i, j] != 0:
                data.append(A[i, j])
                col_idx.append(j)
        row_ptr.append(len(data))
    return np.array(data), np.array(col_idx), np.array(row_ptr)

# In practice: use scipy.sparse
import scipy.sparse as sp
A_csr = sp.csr_matrix(A_dense)
A_dense_back = A_csr.toarray()

8.5 COO <-> CSR <-> CSC

def coo_to_csr(row, col, data, n):
    """COO to CSR via sorting. O(m log m)."""
    # Sort by (row, col)
    order = np.lexsort((col, row))
    row_sorted, col_sorted = row[order], col[order]
    data_sorted = data[order]
    # Build row_ptr via counting sort
    row_counts = np.bincount(row_sorted, minlength=n)
    row_ptr = np.zeros(n + 1, dtype=int)
    np.cumsum(row_counts, out=row_ptr[1:])
    return data_sorted, col_sorted, row_ptr

# Using scipy (recommended in practice):
A_csr = sp.csr_matrix((data, (row, col)), shape=(n, n))
A_csc = A_csr.tocsc()   # O(m) transpose
A_coo = A_csr.tocoo()   # O(m) expansion

8.6 PyG <-> NetworkX <-> SciPy

# NetworkX -> PyG edge_index
import networkx as nx
import torch

def nx_to_edge_index(G):
    edges = list(G.edges())
    if not G.is_directed():
        # Add reverse edges for undirected
        edges += [(v, u) for u, v in edges]
    return torch.tensor(edges, dtype=torch.long).t().contiguous()

# PyG edge_index -> NetworkX
def edge_index_to_nx(edge_index, directed=True):
    G = nx.DiGraph() if directed else nx.Graph()
    G.add_edges_from(edge_index.t().tolist())
    return G

# SciPy CSR -> PyG edge_index
def csr_to_edge_index(A_csr):
    coo = A_csr.tocoo()
    return torch.tensor([coo.row, coo.col], dtype=torch.long)

# PyG edge_index -> SciPy CSR
def edge_index_to_csr(edge_index, n):
    row, col = edge_index[0].numpy(), edge_index[1].numpy()
    data = np.ones(len(row))
    return sp.csr_matrix((data, (row, col)), shape=(n, n))

9. Heterogeneous and Dynamic Graphs

9.1 Heterogeneous Graphs

A heterogeneous graph (also called hetero-graph or knowledge graph) has multiple types of nodes and/or edges:

G = (V, E, \phi, \psi)

where $\phi: V \to \mathcal{T}_V$ assigns each vertex a node type and $\psi: E \to \mathcal{T}_E$ assigns each edge a relation type.

Example - academic knowledge graph:

Node types: Author, Paper, Venue
Edge types: writes (Author->Paper), published_in (Paper->Venue), cites (Paper->Paper), co-author (Author<->Author)

For each relation type $(s, r, t)$ (source type, relation, target type), we maintain a separate adjacency matrix or edge_index tensor:

A^{(s,r,t)} \in \{0,1\}^{|V_s| \times |V_t|}

With $|\mathcal{T}_V|$ node types and $|\mathcal{T}_E|$ relation types, we have at most $|\mathcal{T}_E|$ matrices - typically much smaller than the full graph.

Knowledge graph triple format. Knowledge graphs (Freebase, Wikidata, OGB-MAG) store relations as triples $(h, r, t)$ - head entity, relation, tail entity. This is an edge list over a heterogeneous graph:

triples = [
    ("Albert_Einstein", "nationality", "Germany"),
    ("Theory_of_Relativity", "author", "Albert_Einstein"),
    ...
]

9.2 PyG HeteroData

PyG's HeteroData object generalises Data to heterogeneous graphs:

from torch_geometric.data import HeteroData

data = HeteroData()

# Node feature matrices per type
data['author'].x = torch.randn(n_authors, 64)
data['paper'].x  = torch.randn(n_papers,  128)

# Edge indices and features per relation type
data['author', 'writes', 'paper'].edge_index  = ...  # (2, m_writes)
data['paper',  'cites',  'paper'].edge_index  = ...  # (2, m_cites)
data['paper',  'cites',  'paper'].edge_attr   = ...  # (m_cites, d_e)

print(data.node_types)   # ['author', 'paper']
print(data.edge_types)   # [('author','writes','paper'), ('paper','cites','paper')]

Heterogeneous message passing (RGCN, HAN, HGT) iterates over relation types and aggregates messages per relation:

\mathbf{h}_v' = \text{AGG}\!\left(\left\{\mathbf{W}_r \mathbf{h}_u : u \in \mathcal{N}_r(v)\right\}_{r \in \mathcal{T}_E}\right)

9.3 Dynamic Graphs

A dynamic graph changes over time: edges and vertices are inserted or deleted. The primary representation challenge is supporting updates without full reconstruction.

Adjacency list handles edge insertion in $O(1)$ and deletion in $O(\deg(u))$ . It is the standard choice for dynamic graph algorithms.

CSR is static: every insertion or deletion requires rebuilding the three arrays - $O(m)$ total. For high-frequency updates, CSR is inappropriate.

Dynamic CSR alternatives:

Chunked CSR: Divide the adjacency list into fixed-size chunks; insertions overflow into a "delta" structure
VCSR (Vertex-Centric CSR): Each row is a separate resizeable array; updates are local
Sorted adjacency list: Maintain sorted neighbours per vertex for $O(\log \deg)$ edge lookup

9.4 Temporal Graphs and Snapshots

A temporal graph $G = (V, E, T)$ has edges with timestamps $t_e \in T$ :

E \subseteq V \times V \times T

The most common representation is a snapshot list: $\{G_0, G_1, \ldots, G_T\}$ where $G_t$ is the graph at time step $t$ . Each snapshot can be stored as a separate Data object:

snapshots = [
    Data(x=x_t, edge_index=edge_index_t)
    for t, (x_t, edge_index_t) in enumerate(time_series)
]

For AI - temporal GNNs:

TGAT (Temporal Graph Attention, 2020): Stores interactions as chronologically sorted edge lists with timestamp features; uses time-encoding functions as positional features
DyRep (2019): Maintains a dynamic adjacency list updated via temporal point processes
TGN (Temporal Graph Networks, 2020): Uses memory modules per node to summarise temporal history; edge list sorted by timestamp is the core data structure

The temporal edge list $(u, v, t, \text{feat})$ sorted by $t$ is the standard input format for temporal GNN benchmarks (JODIE, REDDIT datasets in OGB-Temporal).

9.5 Bipartite Graph Representations

Many real-world graphs are bipartite: vertices are split into two disjoint sets $U$ and $V$ with edges only between $U$ and $V$ - no edges within $U$ or within $V$ .

Examples: User-item interactions (recommendation systems), author-paper graphs, word-document matrices, student-course enrollment.

Bipartite graphs require rectangular adjacency matrices $A \in \{0,1\}^{|U| \times |V|}$ rather than square $n \times n$ matrices. This changes the representation:

Representation	Bipartite form
Adjacency matrix	$A \in {0,1}^{
Adjacency list	`adj_u[u]` lists vertices in $V$ ; `adj_v[v]` lists vertices in $U$
Edge list	$(u, v)$ pairs with $u \in U$ , $v \in V$
COO	`edge_index` with source indices in $[0,

In PyG, bipartite graphs use a 2-tuple node feature convention:

data = Data(
    x=(x_u, x_v),          # node features for U and V separately
    edge_index=edge_index,  # source \in [0,|U|), target \in [0,|V|)
)

For AI - collaborative filtering. The matrix factorization model for recommendation can be viewed as operating on a bipartite user-item graph with adjacency $A \in \{0,1\}^{n_u \times n_i}$ :

\hat{R}_{ui} = \mathbf{p}_u^\top \mathbf{q}_i

where $\mathbf{p}_u$ and $\mathbf{q}_i$ are learned embeddings. LightGCN (He et al., 2020) applies GCN layers directly to this bipartite graph, propagating user embeddings through item nodes and vice versa. The core operation is two rectangular SpMM passes: $A \mathbf{Q}$ (aggregate items -> users) and $A^\top \mathbf{P}$ (aggregate users -> items).

9.6 Multiplex and Signed Graphs

Multiplex graphs have multiple layers of edges between the same vertex set - different edge types exist in parallel:

G = (V, E_1, E_2, \ldots, E_k)

Each layer $E_r$ is a separate adjacency matrix $A^{(r)}$ . The representation is a tensor $\mathcal{A} \in \{0,1\}^{n \times n \times k}$ (a stack of adjacency matrices).

Applications: Social networks with multiple relation types (follows, likes, messages), temporal layers (contacts at different time windows), transportation networks (road, rail, air in the same city).

Signed graphs have edges with positive or negative weights - representing trust/distrust, attraction/repulsion, or cooperation/competition. The adjacency matrix $A$ has entries in $\{-1, 0, +1\}$ (or $\mathbb{R}$ for weighted signed graphs). Signed graph Laplacians have non-standard spectral properties; they are studied in signed spectral graph theory.

For AI: Signed graphs arise in:

Sentiment analysis: Stance graphs where edges are agree/disagree
Financial networks: Positive = correlated assets, negative = anti-correlated
Recommendation: Explicit dislikes as negative edges in collaborative filtering
RL reward shaping: Signed reward graphs encode preference relations

10. Choosing a Representation

10.1 Decision Framework

REPRESENTATION SELECTION FLOWCHART
========================================================================

                      START
                        |
              Is n \leq 10,000 AND dense?
              +---------+----------+
             YES                  NO
              |                   |
    Adjacency Matrix    Is graph dynamic (frequent updates)?
                        +---------+----------+
                       YES                  NO
                        |                   |
              Adjacency List      Target GPU / PyG?
                        +---------+----------+
                       YES                  NO
                        |                   |
                    edge_index          Need SpMV / eigenvalues?
                    (COO/PyG)           +---------+----------+
                                       YES                  NO
                                        |                   |
                                      CSR            Adj. List or
                                (scipy.sparse)       Edge List

========================================================================

10.2 By Graph Size and Density

Graph Size	Density $\rho$	Recommended	Rationale
$n \leq 10^3$	Any	Adjacency matrix	Simplest; $O(n^2) \leq 10^6$ - fine
$n \leq 10^4$ , dense	$\rho > 0.1$	Adjacency matrix	Dense matmul is GPU-efficient
$n \leq 10^6$ , sparse	$m \leq 10n$	CSR or adj. list	$O(n)$ memory
$n > 10^6$	Any	CSR + edge list	Billion-edge graphs need streaming
Batch of small graphs	$n_i \leq 100$	Dense padded tensors or PyG batch	GPU efficiency via padding

10.3 By Algorithm Type

Algorithm	Best Representation	Reason
BFS / DFS	Adjacency list	Sequential neighbour access
Dijkstra / Bellman-Ford	Adjacency list (sorted)	Priority queue needs fast neighbours
PageRank / power iteration	CSR	Repeated SpMV
Spectral clustering	CSR -> `eigsh`	Laplacian eigenvectors via ARPACK
GCN / MPNN	COO / edge_index	GPU scatter operations
Link prediction	Adjacency set	Fast edge existence queries
Subgraph sampling	CSR	Row-slice for $k$ -hop neighbourhood
Shortest path (all pairs)	Adjacency matrix	Floyd-Warshall uses matrix

10.4 By Hardware Target

CPU (single node):

SciPy CSR for numerical work
NetworkX adjacency dict for prototyping
Numba-JIT over CSR arrays for performance

GPU (single card):

PyG edge_index + cuSPARSE SpMM
For very large graphs: PyG with NeighborSampler (CSR-based sampling)
DGL for production GNN training (CSR internally, CUDA-optimised)

Multi-GPU / distributed:

Partitioned edge lists (METIS graph partitioning)
DistDGL, GraphLearn, or AliGraph for trillion-edge graphs
Edge list files partitioned on HDFS / S3

TPU:

Padded dense adjacency matrices (fixed shape required for XLA compilation)
Jraph (JAX graph library) uses padded dense representation

10.5 By Framework

Framework	Primary format	Notes
NetworkX	`dict` of `dict` (adj. dict)	Pure Python; slow but feature-rich
PyTorch Geometric	`edge_index` (COO)	Standard for GNN research
DGL	CSR `DGLGraph`	Optimised for large heterogeneous graphs
SciPy / NumPy	CSR `csr_matrix`	Numerical computing; integrates with ARPACK
TensorFlow GNN	`GraphTensor` (COO-based)	TF-native graph representation
Jraph (JAX)	Padded dense `GraphsTuple`	TPU-optimised
SNAP (Stanford)	Binary edge lists	Ultra-large graphs ( $> 10^9$ edges)

10.6 Case Study: Representing the OGB-Arxiv Graph

The OGB-Arxiv dataset (Open Graph Benchmark) is a directed citation network: $n = 169{,}343$ nodes (papers), $m = 1{,}166{,}243$ edges (citations). Let us trace the representation choices made by three frameworks:

NetworkX (baseline, not recommended for training):

G = nx.DiGraph()
G.add_edges_from(edge_list)
# Memory: ~800 MB (Python dict overhead)
# Time to iterate all edges: ~12 s

PyTorch Geometric (research standard):

dataset = NodePropPredDataset('ogbn-arxiv')
data = dataset[0]
# data.edge_index: shape (2, 2_332_486) = bidirectional edges
# data.x: shape (169_343, 128) = feature matrix
# Memory: ~37 MB for edge_index (int64)
# Training time (3-layer GCN, epoch): ~0.4 s on A100

DGL (production):

g = dgl.graph((src, dst), num_nodes=n)
# Internal: CSR for forward pass, CSC for backward pass
# Memory: ~20 MB (int32 indices)
# Training time (3-layer GCN, epoch): ~0.3 s on A100 (faster due to CSR)

The choice between PyG COO and DGL CSR is a 25% throughput difference at this scale - a significant factor in research iteration speed.

11. Preview: Algorithms and Spectral Methods

11.1 How 03 Graph Algorithms Uses These Representations

Graph algorithms depend critically on the choice of representation. The asymptotic complexities in 03 are derived assuming specific representations:

Algorithm	Representation	Complexity
BFS	Adjacency list	$O(n + m)$
Dijkstra (binary heap)	Adjacency list	$O((n + m) \log n)$
Bellman-Ford	Edge list	$O(nm)$
Kruskal MST	Edge list (sorted)	$O(m \log m)$
Prim MST	Adjacency list + heap	$O((n + m) \log n)$
Floyd-Warshall	Adjacency matrix	$O(n^3)$
DFS-based SCC	Adjacency list	$O(n + m)$

Using the wrong representation can change asymptotic complexity: Dijkstra with an adjacency matrix runs in $O(n^2)$ instead of $O((n+m)\log n)$ - a factor of $n/\log n$ slower for sparse graphs. Full algorithmic treatment: 03 Graph Algorithms.

11.2 How 04 Spectral Theory Uses the Laplacian

The Laplacian $L = D - A$ defined in 3.4 is the central object of spectral graph theory. Its eigendecomposition $L = Q \Lambda Q^\top$ reveals:

$\lambda_1 = 0$ always (graph has at least one component)
Number of $\lambda_i = 0$ : number of connected components
$\lambda_2 > 0$ (Fiedler value): measures connectivity strength
Eigenvector $\mathbf{q}_2$ (Fiedler vector): partitions the graph for spectral clustering

Computing the Fiedler vector requires eigsh(L, k=2, which='SM') - a sparse eigensolver that takes CSR format as input. The representation choice (CSR) directly enables the spectral analysis.

Full spectral treatment: 04 Spectral Graph Theory.

11.3 How 05 GNNs Use edge_index and Sparse Adjacency

The three core GNN architectures use the representations from this section differently:

Architecture	Representation Used	Core Operation
GCN (Kipf & Welling, 2017)	Sparse $\hat{A}$ (CSR or COO)	$H' = \sigma(\hat{A} H W)$ - SpMM
GraphSAGE (Hamilton et al., 2017)	Adjacency list (neighbour sampling)	Sample $k$ neighbours per node
GAT (Velickovic et al., 2018)	`edge_index` (COO)	Attention per edge: $\alpha_{ij}$ computed and aggregated

Full GNN architectures, expressiveness theory, and training details: 05 Graph Neural Networks.

12. Common Mistakes

#	Mistake	Why It's Wrong	Fix
1	Using a dense adjacency matrix for a million-node graph	$O(n^2)$ space \approx $10^{12}$ bytes for $n=10^6$ - physically impossible	Use CSR or `edge_index`; dense $A$ is only viable for $n \leq 10^4$
2	Forgetting that `edge_index` is directed by default in PyG	Undirected edges require both $(u,v)$ and $(v,u)$ to appear; missing the reverse direction causes asymmetric message passing	Use `torch_geometric.utils.to_undirected(edge_index)` or add reverse edges explicitly
3	Using $A^\top$ (transpose notation) instead of $A^\top$ (`\top`) for the transpose	Cosmetic but matters: in NOTATION_GUIDE, transpose is `^\top`, not `^T` - " $T$ " looks like a variable name	Write $A^\top$ consistently
4	Assuming the Laplacian is $A - D$ instead of $D - A$	The sign matters: $L = D - A$ is positive semidefinite; $A - D$ is negative semidefinite. GCN derivations assume $L \succeq 0$	Always write $L = D - A$ and verify $\mathbf{x}^\top L \mathbf{x} \geq 0$
5	Confusing CSR `row_ptr` length with $n$ (instead of $n+1$ )	`row_ptr` has $n+1$ entries: `row_ptr[0] = 0`, `row_ptr[n] = m`. Indexing `row_ptr[n]` out-of-bounds crashes code	Always allocate `row_ptr = np.zeros(n+1)`; the extra entry is essential
6	Adding self-loops before normalising in GCN	GCN normalisation is $\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}$ where $\tilde{A} = A + I$ . If you add self-loops to $A$ first and then normalise with the original $D$ , degrees are wrong	Apply self-loops and recompute degree before normalising
7	Using adjacency matrix eigendecomposition when the graph is sparse and large	`np.linalg.eig(A)` is $O(n^3)$ - infeasible for $n > 10^4$	Use `scipy.sparse.linalg.eigsh` with ARPACK for sparse matrices; it computes only $k$ eigenvalues in $O(k \cdot m)$
8	Treating the incidence matrix as $n \times n$ instead of $n \times m$	The incidence matrix has $n$ rows (vertices) and $m$ columns (edges). Confusing rows and columns leads to wrong shapes in $L = BB^\top$	Verify: `B.shape == (n, m)`; `B @ B.T` gives $L \in \mathbb{R}^{n \times n}$
9	Storing undirected graphs as directed without both directions	If `adj[u]` contains `v` but `adj[v]` doesn't contain `u`, BFS/DFS will give wrong results and message passing will be asymmetric	Always add both `adj[u].append(v)` and `adj[v].append(u)` for undirected graphs
10	Using COO format for large repeated SpMV	COO lacks row-pointer structure, so each SpMV must sort entries - $O(m \log m)$ per call. For PageRank or GNN training (many iterations), this is catastrophic	Convert COO to CSR once before the iteration loop; repeated SpMV on CSR is $O(m)$ per call

13. Exercises

Exercise 1 * - Manual CSR Construction

Given the path graph $P_5$ with vertices $\{0,1,2,3,4\}$ and edges $\{0,1\}, \{1,2\}, \{2,3\}, \{3,4\}$ :

(a) Write out the adjacency matrix $A$ . (b) Manually construct the CSR arrays data, col_idx, row_ptr. (c) Verify: data[row_ptr[2]:row_ptr[3]] gives the non-zeros in row 2. (d) Compute the degree of each vertex from row_ptr.

Exercise 2 * - Space Complexity Comparison

For a graph with $n = 10^5$ vertices and $m = 3 \times 10^5$ edges:

(a) Compute the exact number of bytes required for a dense float32 adjacency matrix. (b) Compute the bytes for CSR representation (int32 indices, float32 data). (c) Compute the bytes for edge_index (int64 COO). (d) What fill ratio $\rho$ would make the dense matrix and CSR consume equal memory?

Exercise 3 * - Laplacian Derivation

For the star graph $S_4$ (centre vertex 0, leaves 1, 2, 3, 4):

(a) Write the adjacency matrix $A$ and degree matrix $D$ . (b) Compute $L = D - A$ . (c) Verify $L \mathbf{1} = \mathbf{0}$ . (d) Compute the quadratic form $\mathbf{x}^\top L \mathbf{x}$ for $\mathbf{x} = (1, -1, -1, -1, -1)^\top / 2$ . What does this value measure?

Exercise 4 ** - Conversion Pipeline

Implement the full conversion pipeline:

(a) Start with edge list [(0,1),(0,2),(1,3),(2,3),(3,4)], $n = 5$ . (b) Convert to adjacency list (Python dict). (c) Convert adjacency list to COO arrays. (d) Convert COO to CSR manually (without scipy). (e) Implement SpMV using your CSR arrays; verify against A @ x for $\mathbf{x} = \mathbf{1}$ (should return degree sequence).

Exercise 5 ** - Normalised Adjacency

For the cycle graph $C_5$ :

(a) Construct $A$ (adjacency matrix) and $D$ (degree matrix). (b) Compute the GCN normalised adjacency: $\hat{A} = D^{-1/2} A D^{-1/2}$ . (c) Compute the augmented version: $\tilde{A} = A + I$ , $\tilde{D} = D + I$ , $\hat{\tilde{A}} = \tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2}$ . (d) Verify that the eigenvalues of $\hat{A}$ lie in $[-1, 1]$ and the eigenvalues of $L_{\text{sym}} = I - \hat{A}$ lie in $[0, 2]$ . (e) Explain: why does the normalised Laplacian's eigenvalue bound $[0, 2]$ matter for GNN stability?

Exercise 6 ** - Incidence Matrix and Laplacian

For the triangle graph $K_3$ :

(a) Construct the incidence matrix $B \in \{0,1\}^{3 \times 3}$ . (b) Compute $B B^\top$ and verify it equals $L = D - A$ . (c) Construct the directed incidence matrix $B_{\text{dir}}$ (orient edges $0\to1$ , $1\to2$ , $0\to2$ ). (d) Verify $B_{\text{dir}} B_{\text{dir}}^\top = L$ still holds. (e) Interpret $(B_{\text{dir}}^\top \mathbf{x})_e$ for edge $e = (0 \to 1)$ : what is $x_1 - x_0$ measuring geometrically on the graph?

Exercise 7 *** - PyG Data Object from Scratch

Without using PyG's built-in loaders, construct a torch_geometric.data.Data object from scratch for the karate club graph:

(a) Load the karate club edge list from networkx.karate_club_graph(). (b) Build edge_index (COO, bidirectional, shape $2 \times 2m$ ) without using from_networkx. (c) Build x as a one-hot encoding of each vertex's community (use G.nodes[v]['club']). (d) Compute edge_attr as the normalised weight $w_{ij}/\max_{e} w_e$ for each edge (karate club edges have weights from G[u][v]['weight']). (e) Verify: data.num_nodes == 34, data.num_edges == 2*78 == 156, data.is_undirected() == True.

Exercise 8 *** - Heterogeneous Graph Representation

Consider an academic graph with:

100 authors, 200 papers, 20 venues
Relations: (author, writes, paper) with 400 edges; (paper, published_in, venue) with 200 edges; (paper, cites, paper) with 800 edges

(a) Design the heterogeneous adjacency representation: how many separate edge_index tensors? What are their shapes? (b) Compute the total memory for all edge_index tensors (int64, COO bidirectional where appropriate). (c) Implement a simple heterogeneous degree function: for each author, count the number of papers they wrote. (d) Write the RGCN-style aggregation for author nodes in pseudocode:

\mathbf{h}_a^{(l+1)} = \sigma\!\left(\mathbf{W}_{\text{writes}} \cdot \frac{1}{|\mathcal{P}_a|}\sum_{p \in \mathcal{P}_a} \mathbf{h}_p^{(l)}\right)

where $\mathcal{P}_a$ is the set of papers written by author $a$ . (e) Implement this aggregation using torch.scatter_mean on the (author, writes, paper) edge_index.

14. Why This Matters for AI (2026 Perspective)

Concept	AI Impact
Adjacency matrix $A$	Defines the GCN propagation rule $\hat{A}XW$ ; walk powers $A^k$ count paths used in subgraph GNNs; $A$ 's eigenspectrum is the basis of spectral GNNs
Normalised adjacency $\hat{A}$	Core of GCN (Kipf & Welling, 2017): controls feature smoothing across neighbourhoods; symmetric normalisation prevents gradient explosion in deep GNNs
Laplacian $L = D - A$	Defines graph signal "frequency"; low-frequency signals (smooth) are aligned with small eigenvalues; over-smoothing in deep GNNs corresponds to projecting onto the null space of $L$
CSR / CSC formats	Enable $O(m)$ SpMM in all production GNN frameworks; DGL's CSR-based `DGLGraph` is the internal representation behind thousands of GNN papers
PyG `edge_index`	The de-facto standard for GNN research (2017-present); shapes all of PyG's 80+ model implementations; enables batching, sampling, and differentiable graph operations
Heterogeneous adjacency	RGCN, HAN, HGT, and knowledge graph embedding models (TransE, RotatE) all rely on per-relation adjacency; the 2024 OGB knowledge graph leaderboard is dominated by heterogeneous GNNs
Temporal edge lists	TGN, CAWN, and GraphMixer use chronologically sorted edge lists; the OGB-Temporal benchmark measures dynamic link prediction on graphs with $10^7$ timed interactions
Hypergraph incidence	HOGNNs (Higher-Order GNNs) operate on $k$ -tuples stored as hyperedge incidence matrices; all $k$ -WL expressiveness results are stated in terms of hypergraph structure
Padded dense adjacency	JAX/Jraph for TPU requires fixed shapes; padded dense $A$ enables XLA compilation; models like EigenGNN and SAN use full self-attention (implicit dense $A$ ) for small molecular graphs
Sparse attention patterns	FlashAttention uses a block-sparse adjacency structure for efficient long-sequence attention; the "local window" and "strided" patterns are special cases of sparse `edge_index`
Bipartite adjacency $A \in \mathbb{R}^{	U
Signed adjacency	Sentiment analysis (agree/disagree), financial correlation, and social balance theory all require signed graph representations; signed Laplacians have distinct spectral properties
Dynamic edge lists	Streaming graph frameworks (Flink-based TGN, Kafka-based dynamic GNNs) ingest temporal edge lists in real time; the edge list with timestamps is the universal streaming format

14.1 Representation as a Research Lever

The history of GNN research shows that representation choices have directly enabled new research directions:

2017: edge_index (COO) in PyG enabled arbitrary graph structures - not just grid or sequence graphs - to be processed in a single framework. This unlocked the explosion of GNN variants.
2019: DGL's CSR-based DGLGraph with heterogeneous support enabled the RGCN, HAN, and HGT architectures for knowledge graphs.
2020: The OGB benchmark's standardised edge list format enabled reproducible comparison across GNN architectures for the first time.
2021: Padded dense adjacency in Jraph/JAX enabled TPU-based GNN training, 10-40\times faster than GPU for certain molecular property prediction tasks.
2022: Block-sparse attention patterns (BigBird, LongFormer) showed that limiting the attention graph to a sparse edge_index (local window + global tokens) reduces transformer complexity from $O(n^2)$ to $O(n)$ .
2023-2024: Graph transformers (GPS, NAGphormer, NodeFormer) use hybrid representations: sparse edge_index for local message passing + full dense adjacency for global attention, combining the strengths of both.

The lesson: choosing, designing, or combining representations is itself a research contribution - not just an implementation detail.

15. Conceptual Bridge

Looking Back

This section translates the abstract graph theory of 01 into concrete data structures. Every object defined in 01 - adjacency, degree, Laplacian - now has a precise computational form:

Adjacency (01, Definition 2.1) -> Adjacency matrix $A$ (3.1): the mathematical definition becomes a 2D array
Degree (01, 3.1) -> Row sums of $A$ or row_ptr[i+1] - row_ptr[i] in CSR
Handshaking lemma (01, 3.2): $\sum \deg = 2m$ -> data.sum() == 2 * num_edges for CSR
Graph Laplacian (01, preview) -> Full definition $L = D - A$ (3.4) with quadratic form, PSD proof, and connectivity interpretation

The connection to linear algebra (Ch. 02-03) is now explicit: every graph operation is a matrix operation, and the choice of sparse format determines whether that operation is feasible.

The section also establishes two "bridge" matrices that will recur throughout the rest of the chapter:

The Laplacian $L = D - A$ - defined here, eigendecomposed in 04, used in GCN derivation in 05
The normalised adjacency $\hat{A} = D^{-1/2} A D^{-1/2}$ - defined here, proved to be a spectral filter in 04, implemented as the GCN propagation rule in 05

Mastering these two matrices - their construction, their properties, and their computational representations - is the central payoff of this section.

Looking Forward

The representations defined here are the input to every algorithm in the rest of the chapter:

03 Graph Algorithms implements BFS, DFS, Dijkstra, Kruskal, and topological sort on these representations. The choice of adjacency list vs. matrix vs. CSR directly determines whether algorithms run in $O(n+m)$ or $O(n^2)$ .
04 Spectral Graph Theory eigendecomposes the Laplacian $L = D - A$ defined in 3.4. The scipy.sparse.linalg.eigsh interface takes CSR format as input, connecting the sparse representation to spectral analysis.
05 Graph Neural Networks uses edge_index (COO) for PyG-based GNN implementations. Every GCN, GAT, and GraphSAGE layer operates on the normalised adjacency $\hat{A}$ (3.5) or directly on edge_index via scatter operations.
06 Random Graphs generates graphs via edge lists; analysing their degree distributions and connectivity requires converting to adjacency lists or CSR for efficient computation.

The Big Picture

GRAPH REPRESENTATIONS IN THE CURRICULUM
========================================================================

  01 Graph Basics          02 Graph Representations
  -----------------         --------------------------
  Abstract G=(V,E)    --->   Concrete data structures
  Adjacency concept   --->   Adjacency matrix A, CSR, COO
  Degree definition   --->   Row sums, row_ptr differences
  Laplacian preview   --->   L = D - A (full definition)
                            Normalised A = D^{-1/2}AD^{-1/2}

                    +--------------------------------------+
                    |         02: THE BRIDGE              |
                    |  Abstract ---> Computational          |
                    |  Theory   ---> Implementation         |
                    |  Proofs   ---> Algorithms             |
                    +------+----------------+-------------+
                           |                |
                    03 Algorithms    04 Spectral Theory
                    (use adj. list,   (use L, eigenvectors,
                     edge list, CSR)   CSR + ARPACK)
                           |                |
                           +------+---------+
                                  v
                           05 Graph Neural Networks
                           (edge_index, A, message passing)
                                  v
                           06 Random Graphs
                           (generate edge lists, analyse
                            degree distributions)

========================================================================

Appendix A: Notation Summary

A.1 Graph and Matrix Symbols

The following notation is used consistently throughout this section, following the conventions of docs/NOTATION_GUIDE.md.

Symbol	Meaning	Section
$G = (V, E)$	Graph with vertex set $V$ , edge set $E$	2.1
$n = \lvert V \rvert$	Number of vertices	2.2
$m = \lvert E \rvert$	Number of edges	2.2
$A \in \{0,1\}^{n \times n}$	Adjacency matrix	3.1
$D = \operatorname{diag}(A\mathbf{1})$	Degree matrix	3.4
$L = D - A$	Graph Laplacian	3.4
$\hat{A} = D^{-1/2} A D^{-1/2}$	Symmetrically normalised adjacency	3.5
$\tilde{A} = A + I$	Adjacency with self-loops	3.3
$P = D^{-1} A$	Random walk transition matrix	3.5
$B \in \{0,1\}^{n \times m}$	Incidence matrix	7.1
$\rho = 2m / n(n-1)$	Fill ratio (sparsity measure)	2.3
`edge_index` $\in \mathbb{Z}^{2 \times m}$	PyG COO edge tensor	5.3

Appendix B: Complexity Reference

B.1 Per-Operation Complexity

The following table consolidates the per-operation complexity for all six core representations. Assume $n$ vertices, $m$ edges, maximum degree $\Delta$ .

Notation: "avg" = average case for hash-based structures; "amort" = amortised over a sequence of operations.

Representation	Space	Edge?	Neighbours	SpMV	Build
Adjacency matrix	$O(n^2)$	$O(1)$	$O(n)$	$O(n^2)$	$O(n^2 + m)$
Adjacency list	$O(n+m)$	$O(\deg)$	$O(\deg)$	$O(m)$	$O(n+m)$
Adjacency set	$O(n+m)$	$O(1)$ avg	$O(\deg)$	$O(m)$	$O(n+m)$
Edge list	$O(m)$	$O(m)$	$O(m)$	$O(m \log m)$	$O(m)$
COO	$O(m)$	$O(m)$	$O(m)$	$O(m \log m)$	$O(m)$
CSR	$O(n+m)$	$O(\deg)$	$O(\deg)$	$O(m)$	$O(m \log m)$
CSC	$O(n+m)$	$O(\deg)$	$O(\deg)$	$O(m)$	$O(m \log m)$
Incidence	$O(nm)$	$O(\deg)$	$O(\deg)$	-	$O(nm)$

<- Back to Graph Theory | Previous: Graph Basics <- | Next: Graph Algorithms ->

B.2 Key Break-Even Points

Comparison	Break-even condition	Prefer dense when	Prefer sparse when
Dense $A$ vs CSR space	$\rho = 1/8$ (int32 CSR)	$\rho > 1/8$	$\rho < 1/8$
Dense $A$ vs CSR SpMV	$\rho = 1$ (always equal ops)	Dense has better constants	Sparse wins for $\rho < 0.5$
Adj. list vs CSR	Never (CSR always faster)	Prototyping	Production code
COO vs CSR for SpMV	Single call	COO is simpler to build	CSR is faster for repeat calls
Adj. list vs Adj. set	Edge query frequency	Mostly traversal	Frequent edge queries

B.3 Conversion Cost Summary

All conversions assume the source representation is already built. The $O(m \log m)$ cost of COO -> CSR comes from the sort step (can be reduced to $O(m + n)$ with counting sort when vertex IDs are bounded integers).

Pipeline	Total cost	Bottleneck
File -> edge list -> COO	$O(m)$	File I/O
File -> edge list -> CSR	$O(m \log m)$	Sort by row
CSR -> PyG edge_index	$O(m)$	Expand row_ptr
NetworkX -> PyG	$O(n + m)$	Python dict iteration
PyG -> NetworkX	$O(n + m)$	Edge list construction
Dense $A$ -> CSR	$O(n^2)$	Dense scan
CSR -> Dense $A$	$O(n^2 + m)$	Matrix fill

For large graphs ( $m > 10^7$ ), the COO -> CSR sort is the dominant cost. Use radix sort (available in CUDA via thrust::sort_by_key) for GPU-accelerated sorting in $O(m)$ expected time.

B.4 Memory Footprint Formulas

For a graph with $n$ vertices and $m$ edges using 32-bit integers and 32-bit floats:

Format	Memory (bytes)	Example: $n=10^5$ , $m=5\times10^5$
Dense float32	$4n^2$	40 GB
Dense bool	$n^2/8$	1.25 GB
Adjacency list (Python)	$\approx 80m$	40 MB
CSR (int32, unweighted)	$4(n+1+m)$	2.0 MB
CSR (int32 + float32 weights)	$4(n+1+2m)$	6.0 MB
COO (int64 edge_index)	$16m$	8.0 MB
LIL (Python, sorted)	$\approx 100m$	50 MB

For reference: 1 GB = $10^9$ bytes. A graph with $n = 10^6$ and $m = 5 \times 10^6$ (typical social network) requires:

Dense: 4 TB (impossible on a single machine)
CSR int32: ~24 MB (fits in L3 cache of a modern server)

This 170,000\times difference explains why every large-scale graph ML system uses sparse representations exclusively.

End of 02 Graph Representations. Proceed to 03 Graph Algorithms to see these representations in action.

Graph Representations: Part 2 - Incidence Matrix To Appendix B Complexity Reference