Exercises NotebookMath for LLMs

Markov Chains

Probability Theory / Markov Chains

Run notebook
Exercises Notebook

Exercises Notebook

Converted from exercises.ipynb for web reading.

Markov Chains — Exercises

10 exercises covering transition matrices, state classification, stationary distributions, detailed balance, mixing times, Metropolis-Hastings, PageRank, and Hidden Markov Models.

FormatDescription
ProblemMarkdown cell with task description
Your SolutionCode cell with scaffolding
SolutionCode cell with reference solution and checks

Difficulty Levels

LevelExercisesFocus
1–3Core mechanics
★★4–6Deeper theory
★★★7-10AI / ML applications

Topic Map

TopicExercise
Transition matrices & distribution evolution1
State classification2
Stationary distributions3
Detailed balance & birth-death chains4
Mixing time & spectral gap5
Metropolis-Hastings6
PageRank7
Hidden Markov Models8
Absorbing chains9
Gibbs sampling10

Code cell 2

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

try:
    import seaborn as sns
    sns.set_theme(style="whitegrid", palette="colorblind")
    HAS_SNS = True
except ImportError:
    plt.style.use("seaborn-v0_8-whitegrid")
    HAS_SNS = False

mpl.rcParams.update({
    "figure.figsize":    (10, 6),
    "figure.dpi":         120,
    "font.size":           13,
    "axes.titlesize":      15,
    "axes.labelsize":      13,
    "xtick.labelsize":     11,
    "ytick.labelsize":     11,
    "legend.fontsize":     11,
    "legend.framealpha":   0.85,
    "lines.linewidth":      2.0,
    "axes.spines.top":     False,
    "axes.spines.right":   False,
    "savefig.bbox":       "tight",
    "savefig.dpi":         150,
})
np.random.seed(42)
print("Plot setup complete.")

Code cell 3

import numpy as np
import numpy.linalg as la

try:
    import matplotlib.pyplot as plt
    HAS_MPL = True
except ImportError:
    HAS_MPL = False

np.set_printoptions(precision=6, suppress=True)
np.random.seed(42)

def header(title):
    print('\n' + '=' * len(title))
    print(title)
    print('=' * len(title))

def check_close(name, got, expected, tol=1e-6):
    ok = np.allclose(got, expected, atol=tol, rtol=tol)
    print(f"{'PASS' if ok else 'FAIL'}{name}")
    if not ok:
        print('  expected:', expected)
        print('  got     :', got)
    return ok

def check_true(name, cond):
    print(f"{'PASS' if cond else 'FAIL'}{name}")
    return cond

def is_stochastic(P):
    return np.all(P >= -1e-12) and np.allclose(P.sum(axis=1), 1)

def tv_distance(p, q):
    return 0.5 * np.abs(p - q).sum()

print('Setup complete.')

Exercise 1 ★ — Transition Matrix and Distribution Evolution

Consider a 3-state weather chain: Sunny (0), Cloudy (1), Rainy (2).

P=(0.70.20.10.30.40.30.20.30.5)P = \begin{pmatrix} 0.7 & 0.2 & 0.1 \\ 0.3 & 0.4 & 0.3 \\ 0.2 & 0.3 & 0.5 \end{pmatrix}

(a) Verify PP is a valid stochastic matrix.

(b) Compute μ5=μ0P5\mu_5 = \mu_0 P^5 where μ0=(1,0,0)\mu_0 = (1, 0, 0) (start Sunny).

(c) Compute the 10-step transition matrix P10P^{10} and check rows approach a common vector.

(d) Verify Chapman-Kolmogorov: Pm+n=PmPnP^{m+n} = P^m P^n for m=3,n=4m=3, n=4.

Code cell 5

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 6

# Solution
P = np.array([[0.7, 0.2, 0.1],
              [0.3, 0.4, 0.3],
              [0.2, 0.3, 0.5]])
mu0 = np.array([1.0, 0.0, 0.0])

# (a)
valid = is_stochastic(P)

# (b) mu5 = mu0 @ P^5
P5 = np.linalg.matrix_power(P, 5)
mu5 = mu0 @ P5

# (c) P^10
P10 = np.linalg.matrix_power(P, 10)
row_spread = P10.max(axis=0) - P10.min(axis=0)  # should be small

# (d) Chapman-Kolmogorov
P3 = np.linalg.matrix_power(P, 3)
P4 = np.linalg.matrix_power(P, 4)
P7 = np.linalg.matrix_power(P, 7)
ck_ok = np.allclose(P3 @ P4, P7)

header('Exercise 1: Transition Matrix and Distribution Evolution')
print(f'(a) Valid stochastic matrix: {valid}')
print(f'(b) mu_5 = {mu5}')
print(f'(c) P^10 row spread (should be ~0): {row_spread}')
print(f'(d) Chapman-Kolmogorov P^3 P^4 = P^7: {ck_ok}')

check_true('P is stochastic', valid)
check_true('mu5 sums to 1', np.isclose(mu5.sum(), 1.0))
check_true('P^10 rows converged', np.all(row_spread < 0.01))
check_true('Chapman-Kolmogorov holds', ck_ok)
print('\nTakeaway: Chapman-Kolmogorov is matrix multiplication — LLM token sampling chains '
      'satisfy this, enabling efficient multi-step probability computation.')

Exercise 2 ★ — State Classification

Consider this 5-state transition matrix:

P=(010000.500.50000.500.50000.500.500001)P = \begin{pmatrix}0 & 1 & 0 & 0 & 0 \\0.5 & 0 & 0.5 & 0 & 0 \\0 & 0.5 & 0 & 0.5 & 0 \\0 & 0 & 0.5 & 0 & 0.5 \\0 & 0 & 0 & 0 & 1\end{pmatrix}

(a) Identify all communicating classes.

(b) Classify each class as recurrent or transient.

(c) Find all absorbing states.

(d) Compute the absorption probabilities from state 0: probability of being absorbed into state 4.

Code cell 8

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 9

# Solution
P = np.array([[0.0, 1.0, 0.0, 0.0, 0.0],
              [0.5, 0.0, 0.5, 0.0, 0.0],
              [0.0, 0.5, 0.0, 0.5, 0.0],
              [0.0, 0.0, 0.5, 0.0, 0.5],
              [0.0, 0.0, 0.0, 0.0, 1.0]])

# (a) Communicating classes
# States 0-3 can reach each other (bidirectional); state 4 is isolated absorbing
# BUT: 0,1,2,3 can all reach state 4 but 4 cannot reach them → 4 is its own class
# Actual classes: {0,1,2,3} transient, {4} recurrent (absorbing)

# (c) Absorbing states: P[i,i] = 1
absorbing = [i for i in range(5) if np.isclose(P[i, i], 1.0)]

# (d) Fundamental matrix: transient states = {0,1,2,3}, absorbing = {4}
# Q = P[transient, transient], R = P[transient, absorbing]
Q_sub = P[:4, :4]
R_sub = P[:4, 4:5]  # only absorbing state is 4
N = np.linalg.inv(np.eye(4) - Q_sub)  # fundamental matrix
B = N @ R_sub  # absorption probabilities
prob_abs_4 = B[0, 0]  # from state 0

# Verify: gambler's ruin with p=0.5, states 0..4, barriers at 4
# Actually state 0 is a reflecting barrier here (P[0,1]=1), ruin at boundary....
# With uniform p=0.5 and absorbing only at 4: prob = 0/4 = 0 from state 0... 
# But state 0 goes to 1 with prob 1, then random walk. From state i: prob_i = i/4
theory_probs = np.array([i/4 for i in range(4)])

header('Exercise 2: State Classification')
print(f'(a) Communicating classes: {{0,1,2,3}} (transient), {{4}} (recurrent/absorbing)')
print(f'(b) States 0-3: transient; state 4: recurrent (absorbing)')
print(f'(c) Absorbing states: {absorbing}')
print(f'(d) Absorption probabilities from each transient state: {B.flatten()}')
print(f'    Theory (i/4): {theory_probs}')
print(f'    From state 0: P(absorbed at 4) = {prob_abs_4:.4f}')

check_true('State 4 is absorbing', absorbing == [4])
check_close('Absorption probs match theory', B.flatten(), theory_probs, tol=1e-5)
print('\nTakeaway: Fundamental matrix N=(I-Q)^{-1} encodes expected visits before absorption — '
      'directly used in RL to compute expected returns in episodic MDPs.')

Exercise 3 ★ — Computing Stationary Distributions

For the ergodic chain:

P=(0.10.60.30.40.20.40.70.10.2)P = \begin{pmatrix} 0.1 & 0.6 & 0.3 \\ 0.4 & 0.2 & 0.4 \\ 0.7 & 0.1 & 0.2 \end{pmatrix}

(a) Find π\pi by solving the linear system πP=π\pi P = \pi, πi=1\sum \pi_i = 1.

(b) Find π\pi via power iteration starting from μ0=(1/3,1/3,1/3)\mu_0 = (1/3, 1/3, 1/3).

(c) Find π\pi from the dominant left eigenvector of PP.

(d) Verify all three methods agree.

Code cell 11

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 12

# Solution
P = np.array([[0.1, 0.6, 0.3],
              [0.4, 0.2, 0.4],
              [0.7, 0.1, 0.2]])

# (a) Linear system: (P^T - I)^T pi = 0, sum = 1
A = P.T - np.eye(3)
A[-1] = 1  # replace last eq with normalisation
b = np.zeros(3); b[-1] = 1
pi_linear = np.linalg.solve(A, b)

# (b) Power iteration
mu = np.ones(3) / 3
for _ in range(1000):
    mu = mu @ P
pi_power = mu

# (c) Left eigenvector of P (eigenvalue 1 of P^T)
vals, vecs = np.linalg.eig(P.T)
idx = np.argmax(np.real(vals))
pi_eig = np.real(vecs[:, idx])
pi_eig = pi_eig / pi_eig.sum()

header('Exercise 3: Stationary Distributions')
print(f'(a) Linear system:   pi = {pi_linear}')
print(f'(b) Power iteration: pi = {pi_power}')
print(f'(c) Left eigenvec:   pi = {pi_eig}')
print(f'\npi P = pi check: {pi_linear @ P}')
print(f'(should equal):  {pi_linear}')

check_true('pi P = pi (linear)', np.allclose(pi_linear @ P, pi_linear))
check_close('Power iter agrees with linear', pi_power, pi_linear)
check_close('Eigenvec agrees with linear', pi_eig, pi_linear)
check_true('pi sums to 1', np.isclose(pi_linear.sum(), 1.0))
print('\nTakeaway: All three methods give the same pi; power iteration is the '
      'algorithm behind PageRank and transformer attention score normalisation.')

Exercise 4 ★★ — Detailed Balance and Birth-Death Chains

A birth-death chain on states {0,1,2,3,4}\{0, 1, 2, 3, 4\} has pi=0.4p_i = 0.4 (right) and qi=0.6q_i = 0.6 (left) for i=1,2,3i = 1, 2, 3, with reflecting barriers: p0=1p_0 = 1, q4=1q_4 = 1.

(a) Write the transition matrix PP.

(b) Use detailed balance πipi=πi+1qi+1\pi_i p_i = \pi_{i+1} q_{i+1} to find the stationary distribution.

(c) Verify PP satisfies detailed balance with your π\pi.

(d) Is this chain reversible? Explain.

(e) Compare convergence rate to the symmetric (unbiased) chain with pi=qi=0.5p_i = q_i = 0.5.

Code cell 14

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 15

# Solution
p = 0.4; q = 0.6; N = 5

# (a) Transition matrix
P = np.zeros((N, N))
P[0, 1] = 1.0          # reflecting at 0
P[N-1, N-2] = 1.0      # reflecting at N-1
for i in range(1, N-1):
    P[i, i+1] = p
    P[i, i-1] = q

# (b) Detailed balance: pi[i] * P[i,i+1] = pi[i+1] * P[i+1,i]
# => pi[i+1] = pi[i] * (p/q) for i=1..3; pi[1] = pi[0] (from state 0)
# ratio r = p/q
r = p / q
pi_unnorm = np.zeros(N)
pi_unnorm[0] = 1.0
# State 0 -> 1: P[0,1]=1, P[1,0]=q. DB: pi[0]*1 = pi[1]*q => pi[1]=1/q
pi_unnorm[1] = pi_unnorm[0] / q
for i in range(1, N-1):
    # pi[i]*p = pi[i+1]*q => pi[i+1] = pi[i]*(p/q)
    pi_unnorm[i+1] = pi_unnorm[i] * r
pi_db = pi_unnorm / pi_unnorm.sum()

# (c) Check detailed balance: pi[i]*P[i,j] = pi[j]*P[j,i] for all i,j
db_ok = True
for i in range(N):
    for j in range(N):
        if not np.isclose(pi_db[i]*P[i,j], pi_db[j]*P[j,i]):
            db_ok = False
            break

# (e) Compare convergence: spectral gap
evals_biased = sorted(np.abs(np.linalg.eigvals(P)), reverse=True)
gap_biased = 1 - evals_biased[1]

P_sym = np.zeros((N, N))
P_sym[0, 1] = 1.0; P_sym[N-1, N-2] = 1.0
for i in range(1, N-1):
    P_sym[i, i+1] = 0.5; P_sym[i, i-1] = 0.5
evals_sym = sorted(np.abs(np.linalg.eigvals(P_sym)), reverse=True)
gap_sym = 1 - evals_sym[1]

header('Exercise 4: Detailed Balance and Birth-Death Chains')
print(f'(b) Stationary distribution: {pi_db}')
print(f'    Verify pi P = pi: {np.allclose(pi_db @ P, pi_db)}')
print(f'(c) Detailed balance satisfied: {db_ok}')
print(f'(d) Chain IS reversible (birth-death chains always are)')
print(f'(e) Spectral gap — biased (p=0.4): {gap_biased:.4f}, symmetric (p=0.5): {gap_sym:.4f}')

check_true('P is stochastic', is_stochastic(P))
check_true('Detailed balance satisfied', db_ok)
check_true('pi P = pi', np.allclose(pi_db @ P, pi_db))
check_true('pi sums to 1', np.isclose(pi_db.sum(), 1.0))
print('\nTakeaway: Detailed balance is what makes Metropolis-Hastings correct — '
      'constructing proposals that satisfy pi(x)q(x->y) = pi(y)q(y->x) guarantees pi is stationary.')

Exercise 5 ★★ — Mixing Time and Spectral Gap

For the lazy random walk on a cycle CnC_n with n=8n=8 states: Pii=1/2P_{ii} = 1/2, Pi,i±1=1/4P_{i,i\pm1} = 1/4 (mod nn).

(a) Compute the spectral gap gap=1λ2\text{gap} = 1 - \lambda_2.

(b) Compute empirical mixing time tmix(ε=0.25)t_{\text{mix}}(\varepsilon=0.25) by tracking TV distance from state 0.

(c) Compare to the spectral bound tmixlog(1/(επmin))/gapt_{\text{mix}} \leq \lceil \log(1/(\varepsilon \pi_{\min})) / \text{gap} \rceil.

(d) Show that the non-lazy cycle (Pi,i±1=1/2P_{i,i\pm1} = 1/2) has a larger spectral gap but slower mixing. Why?

Code cell 17

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 18

# Solution
import numpy as np

n = 8

# (a) Lazy cycle
P_lazy = np.zeros((n, n))
for i in range(n):
    P_lazy[i, i] = 0.5
    P_lazy[i, (i+1) % n] = 0.25
    P_lazy[i, (i-1) % n] += 0.25

evals = np.sort(np.real(np.linalg.eigvals(P_lazy)))[::-1]
gap = 1 - evals[1]

# (b) Empirical mixing time
pi_stat = np.ones(n) / n
eps = 0.25
mu = np.zeros(n); mu[0] = 1.0
t_mix_emp = 0
for t in range(1, 500):
    mu = mu @ P_lazy
    if 0.5 * np.abs(mu - pi_stat).sum() <= eps:
        t_mix_emp = t
        break

# (c) Spectral bound
import math
pi_min = 1.0 / n
t_mix_bound = math.ceil(math.log(1 / (eps * pi_min)) / gap)

# (d) Non-lazy cycle
P_nonlazy = np.zeros((n, n))
for i in range(n):
    P_nonlazy[i, (i+1) % n] = 0.5
    P_nonlazy[i, (i-1) % n] = 0.5
evals_nl = np.sort(np.real(np.linalg.eigvals(P_nonlazy)))[::-1]
# Non-lazy cycle has eigenvalue -1 (period 2), so gap_nl = 1 - evals_nl[1]
# but the chain is periodic → doesn't converge!
gap_nl = 1 - evals_nl[1]
has_neg_eval = np.any(evals_nl < -0.9)

header = lambda t: (print('\n' + '='*len(t)), print(t), print('='*len(t)))
header('Exercise 5: Mixing Time and Spectral Gap')
print(f'(a) Spectral gap (lazy cycle n=8): {gap:.6f}')
print(f'(b) Empirical mixing time (eps=0.25): {t_mix_emp}')
print(f'(c) Spectral bound: {t_mix_bound}')
print(f'(d) Non-lazy eigenvalues include -1: {has_neg_eval} → periodic, does not converge')
print(f'    Lazy chain adds self-loop to break periodicity — standard trick for MCMC')

def check_true(name, cond):
    print(f"{'PASS' if cond else 'FAIL'}{name}")
    return cond

check_true('Gap > 0 (lazy is aperiodic)', gap > 0)
check_true('Empirical t_mix <= bound', t_mix_emp <= t_mix_bound)
check_true('Non-lazy has eigenvalue -1 (periodic)', has_neg_eval)
print('\nTakeaway: The lazy trick (add I/2) is used in MCMC implementations to '
      'guarantee aperiodicity without changing the stationary distribution.')

Exercise 6 ★★ — Metropolis-Hastings

Target: π(x)ex4/4+x2/2\pi(x) \propto e^{-x^4/4 + x^2/2} (double-well potential). Proposal: q(xx)=N(x,σ2)q(x'|x) = \mathcal{N}(x, \sigma^2) (symmetric, so MH simplifies to Metropolis).

(a) Implement the Metropolis algorithm and draw N=10000N=10000 samples with σ=0.5\sigma=0.5.

(b) Estimate the acceptance rate. Theory suggests ~23% is optimal.

(c) Verify the sampler visits both modes near x±1x \approx \pm 1.

(d) Verify detailed balance numerically: for a discretised version on [3,3][-3, 3] with 30 bins, check π(x)P(x,x)π(x)P(x,x)\pi(x) P(x, x') \approx \pi(x') P(x', x).

Code cell 20

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 21

# Solution
import numpy as np
np.random.seed(42)

def log_pi(x):
    return -x**4/4 + x**2/2

sigma = 0.5; N = 10000

# (a) Metropolis
samples = []
accepts = 0
x = 0.0
for _ in range(N):
    x_prop = x + sigma * np.random.randn()
    log_alpha = log_pi(x_prop) - log_pi(x)
    if np.log(np.random.rand()) < log_alpha:
        x = x_prop
        accepts += 1
    samples.append(x)
samples = np.array(samples)

# (b) Acceptance rate
acc_rate = accepts / N

# (c) Mode visitation (burn-in 2000)
s = samples[2000:]
frac_pos = np.mean(s > 0.5)
frac_neg = np.mean(s < -0.5)

# (d) Detailed balance on grid
grid = np.linspace(-3, 3, 30)
pi_unnorm = np.exp(log_pi(grid))
pi_grid = pi_unnorm / pi_unnorm.sum()
# Transition kernel on grid: Gaussian proposals, accept/reject
# Approximate: P(i->j) = phi((grid[j]-grid[i])/sigma) * min(1, pi[j]/pi[i])
from scipy.stats import norm
dg = grid[1] - grid[0]
P_grid = np.zeros((30, 30))
for i in range(30):
    for j in range(30):
        if i != j:
            P_grid[i,j] = norm.pdf(grid[j]-grid[i], 0, sigma)*dg * min(1, pi_unnorm[j]/pi_unnorm[i])
    P_grid[i,i] = max(0, 1 - P_grid[i].sum())
# Check DB
db_errors = []
for i in range(30):
    for j in range(30):
        db_errors.append(abs(pi_grid[i]*P_grid[i,j] - pi_grid[j]*P_grid[j,i]))
max_db_err = max(db_errors)

def header(t): print('\n'+'='*len(t)); print(t); print('='*len(t))
def check_true(n,c): print(f"{'PASS' if c else 'FAIL'}{n}"); return c

header('Exercise 6: Metropolis-Hastings')
print(f'(b) Acceptance rate: {acc_rate:.3f} (target ~0.23-0.50 for sigma=0.5)')
print(f'(c) Fraction near +mode (>0.5): {frac_pos:.3f}')
print(f'    Fraction near -mode (<-0.5): {frac_neg:.3f}')
print(f'(d) Max detailed balance error: {max_db_err:.6f} (should be ~0)')

check_true('Sampler visits positive mode', frac_pos > 0.2)
check_true('Sampler visits negative mode', frac_neg > 0.2)
check_true('Detailed balance holds', max_db_err < 0.01)
print('\nTakeaway: MH is the backbone of Bayesian posterior sampling — the '
      'acceptance step is exactly what makes the chain leave any target distribution invariant.')

Exercise 7 ★★★ — PageRank

Consider a 6-page web graph:

A=(010100001010000101100010010001101000)A = \begin{pmatrix}0&1&0&1&0&0\\0&0&1&0&1&0\\0&0&0&1&0&1\\1&0&0&0&1&0\\0&1&0&0&0&1\\1&0&1&0&0&0\end{pmatrix}

where Aij=1A_{ij}=1 means page ii links to page jj.

(a) Build the PageRank matrix G=α(D1A)+(1α)/n11TG = \alpha (D^{-1}A) + (1-\alpha)/n \cdot \mathbf{1}\mathbf{1}^T with α=0.85\alpha=0.85.

(b) Find PageRank scores via power iteration.

(c) Find PageRank scores via left eigenvector of GG and verify they agree.

(d) Identify the most and least important pages.

(e) Compute the convergence rate (spectral gap of GG) and verify it equals αλ2(D1A)\alpha \cdot \lambda_2(D^{-1}A).

Code cell 23

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 24

# Solution
import numpy as np

A = np.array([
    [0,1,0,1,0,0],
    [0,0,1,0,1,0],
    [0,0,0,1,0,1],
    [1,0,0,0,1,0],
    [0,1,0,0,0,1],
    [1,0,1,0,0,0],
], dtype=float)
n = 6; alpha = 0.85

# (a) PageRank Google matrix
out_deg = A.sum(axis=1)
D_inv_A = A / out_deg[:, None]
G = alpha * D_inv_A + (1 - alpha) / n * np.ones((n, n))

# (b) Power iteration
pi = np.ones(n) / n
for _ in range(200):
    pi = pi @ G
pi_pr = pi

# (c) Left eigenvector
vals, vecs = np.linalg.eig(G.T)
idx = np.argmax(np.real(vals))
pi_eig = np.real(vecs[:, idx])
pi_eig = pi_eig / pi_eig.sum()
# Align sign
if pi_eig[0] < 0: pi_eig = -pi_eig

# (d) Ranking
ranks = np.argsort(pi_pr)[::-1]

# (e) Convergence rate
evals_G = np.sort(np.abs(np.real(np.linalg.eigvals(G))))[::-1]
gap_G = 1 - evals_G[1]
evals_rw = np.sort(np.abs(np.real(np.linalg.eigvals(D_inv_A))))[::-1]
theory_gap = 1 - alpha * evals_rw[1]

def header(t): print('\n'+'='*len(t)); print(t); print('='*len(t))
def check_true(n,c): print(f"{'PASS' if c else 'FAIL'}{n}"); return c
def check_close(n,g,e,tol=1e-5):
    ok = np.allclose(g,e,atol=tol); print(f"{'PASS' if ok else 'FAIL'}{n}"); return ok

header('Exercise 7: PageRank')
print(f'PageRank scores: {pi_pr}')
print(f'Page ranking (most to least): {ranks + 1}')
print(f'Most important: page {ranks[0]+1} ({pi_pr[ranks[0]]:.4f})')
print(f'Least important: page {ranks[-1]+1} ({pi_pr[ranks[-1]]:.4f})')
print(f'Convergence gap: {gap_G:.6f}, theory alpha*lambda2: {theory_gap:.6f}')

check_true('G is stochastic', np.allclose(G.sum(axis=1), 1))
check_true('pi sums to 1', np.isclose(pi_pr.sum(), 1.0))
check_close('Power iter and eigenvec agree', pi_pr, pi_eig)
check_true('Gap matches theory', abs(gap_G - theory_gap) < 0.01)
print('\nTakeaway: PageRank is power iteration on a Markov chain — the damping factor '
      'alpha=0.85 controls the spectral gap and thus convergence speed.')

Exercise 8 ★★★ — Hidden Markov Models

A 3-state HMM models DNA sequence (CpG islands):

  • Hidden states: {H,L,N}\{H, L, N\} (high-GC island, low-GC island, neutral)
  • Observations: {A,C,G,T}\{A, C, G, T\}
A=(0.60.20.20.10.70.20.20.30.5),B=(0.10.40.40.10.30.20.20.30.250.250.250.25)A = \begin{pmatrix}0.6 & 0.2 & 0.2\\0.1 & 0.7 & 0.2\\0.2 & 0.3 & 0.5\end{pmatrix}, \quad B = \begin{pmatrix}0.1&0.4&0.4&0.1\\0.3&0.2&0.2&0.3\\0.25&0.25&0.25&0.25\end{pmatrix}

π0=(0.5,0.3,0.2)\pi_0 = (0.5, 0.3, 0.2)

(a) Implement the forward algorithm to compute P(o1:T)P(\mathbf{o}_{1:T}) for the sequence ACGTATCG.

(b) Implement Viterbi decoding to find the most likely hidden state sequence.

(c) Verify the forward algorithm with the scaled version (numerically stable).

(d) Compute the posterior state probabilities P(zto1:T)P(z_t | \mathbf{o}_{1:T}) at each step.

Code cell 26

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 27

# Solution
import numpy as np

A_hmm = np.array([[0.6, 0.2, 0.2],
                   [0.1, 0.7, 0.2],
                   [0.2, 0.3, 0.5]])
B_hmm = np.array([[0.1, 0.4, 0.4, 0.1],
                   [0.3, 0.2, 0.2, 0.3],
                   [0.25,0.25,0.25,0.25]])
pi0 = np.array([0.5, 0.3, 0.2])
obs_seq = [0, 1, 2, 3, 0, 3, 1, 2]
S, T = 3, len(obs_seq)

# (a) Forward algorithm (scaled)
alpha = np.zeros((T, S))
scales = np.zeros(T)
alpha[0] = pi0 * B_hmm[:, obs_seq[0]]
scales[0] = alpha[0].sum()
alpha[0] /= scales[0]
for t in range(1, T):
    alpha[t] = (alpha[t-1] @ A_hmm) * B_hmm[:, obs_seq[t]]
    scales[t] = alpha[t].sum()
    alpha[t] /= scales[t]
log_prob = np.log(scales).sum()

# (b) Viterbi
delta = np.zeros((T, S))
psi = np.zeros((T, S), dtype=int)
delta[0] = np.log(pi0 + 1e-300) + np.log(B_hmm[:, obs_seq[0]] + 1e-300)
for t in range(1, T):
    for s in range(S):
        trans = delta[t-1] + np.log(A_hmm[:, s] + 1e-300)
        psi[t, s] = np.argmax(trans)
        delta[t, s] = trans[psi[t, s]] + np.log(B_hmm[s, obs_seq[t]] + 1e-300)
# Backtrack
best_path = np.zeros(T, dtype=int)
best_path[T-1] = np.argmax(delta[T-1])
for t in range(T-2, -1, -1):
    best_path[t] = psi[t+1, best_path[t+1]]

# (c) Unscaled forward for comparison
alpha_raw = np.zeros((T, S))
alpha_raw[0] = pi0 * B_hmm[:, obs_seq[0]]
for t in range(1, T):
    alpha_raw[t] = (alpha_raw[t-1] @ A_hmm) * B_hmm[:, obs_seq[t]]
log_prob_raw = np.log(alpha_raw[T-1].sum())

# (d) Backward + posterior
beta = np.zeros((T, S))
beta[T-1] = 1.0
for t in range(T-2, -1, -1):
    beta[t] = A_hmm @ (B_hmm[:, obs_seq[t+1]] * beta[t+1])
gamma = alpha * beta
gamma_unnorm = gamma.copy()
# Recompute with unscaled alpha for posterior
alpha2 = alpha_raw.copy()
gamma2 = alpha2 * beta
gamma2 /= gamma2.sum(axis=1, keepdims=True)

state_names = ['H', 'L', 'N']
obs_names = ['A', 'C', 'G', 'T']
seq_str = ''.join(obs_names[o] for o in obs_seq)
path_str = ''.join(state_names[s] for s in best_path)

def header(t): print('\n'+'='*len(t)); print(t); print('='*len(t))
def check_true(n,c): print(f"{'PASS' if c else 'FAIL'}{n}"); return c
def check_close(n,g,e,tol=1e-4):
    ok = np.allclose(g,e,atol=tol); print(f"{'PASS' if ok else 'FAIL'}{n}"); return ok

header('Exercise 8: Hidden Markov Models')
print(f'Sequence: {seq_str}')
print(f'(a) log P(obs): {log_prob:.4f}')
print(f'(b) Viterbi path: {path_str}')
print(f'(c) Scaled vs unscaled log-prob: {log_prob:.4f} vs {log_prob_raw:.4f}')
print(f'(d) Posterior at t=0: {gamma2[0]} (H, L, N)')

check_close('Scaled and unscaled log-prob agree', log_prob, log_prob_raw, tol=1e-3)
check_true('Posteriors sum to 1 at each step', np.allclose(gamma2.sum(axis=1), 1.0))
check_true('Viterbi path has valid state indices', all(0 <= s < 3 for s in best_path))
print('\nTakeaway: The forward algorithm is O(T|S|^2) — the same dynamic programming '
      'structure underlies attention in transformers and CTC decoding in speech models.')

Exercise 9 *** - Absorbing Chains and the Fundamental Matrix

An absorbing Markov chain has transient states 0 and 1 and absorbing states 2 and 3.

Part (a): Partition PP into QQ and RR.

Part (b): Compute the fundamental matrix N=(IQ)1N=(I-Q)^{-1}.

Part (c): Compute absorption probabilities B=NRB=NR and expected time to absorption.

Part (d): Interpret the result as expected time to termination in a sequential decision process.

Code cell 29

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 30

# Solution
P_abs = np.array([
    [0.4, 0.4, 0.2, 0.0],
    [0.2, 0.5, 0.0, 0.3],
    [0.0, 0.0, 1.0, 0.0],
    [0.0, 0.0, 0.0, 1.0],
])
Q = P_abs[:2, :2]
R = P_abs[:2, 2:]
Nmat = np.linalg.inv(np.eye(2) - Q)
B = Nmat @ R
t_abs = Nmat @ np.ones(2)

header('Exercise 9: Absorbing Chains')
print('Fundamental matrix N:')
print(Nmat.round(4))
print('Absorption probabilities B:')
print(B.round(4))
print(f'Expected time to absorption: {t_abs.round(4)}')
check_true('Absorption probabilities sum to 1', np.allclose(B.sum(axis=1), 1.0))
check_true('Expected absorption times are positive', np.all(t_abs > 0))
print('Takeaway: the fundamental matrix counts expected visits before absorption, which is useful for episodic RL and reliability models.')

Exercise 10 *** - Gibbs Sampling for a Correlated Gaussian

Let (X,Y)(X,Y) be a zero-mean bivariate Gaussian with correlation rho=0.8\\rho=0.8.

Part (a): Derive the conditionals XY=yN(rhoy,1rho2)X\mid Y=y\sim\mathcal{N}(\\rho y,1-\\rho^2) and similarly for YX=xY\mid X=x.

Part (b): Implement Gibbs sampling using these conditionals.

Part (c): Estimate the sample covariance matrix and compare with the target.

Part (d): Explain why Gibbs sampling is a Markov chain with the target distribution stationary.

Code cell 32

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 33

# Solution
rho = 0.8
n = 30000
burn = 2000
samples = np.zeros((n, 2))
x, y = 0.0, 0.0
cond_sd = np.sqrt(1 - rho**2)
for t in range(n):
    x = np.random.normal(rho * y, cond_sd)
    y = np.random.normal(rho * x, cond_sd)
    samples[t] = [x, y]
post = samples[burn:]
emp_cov = np.cov(post.T)
target_cov = np.array([[1.0, rho], [rho, 1.0]])

header('Exercise 10: Gibbs Sampling for a Correlated Gaussian')
print('Empirical covariance:')
print(emp_cov.round(4))
print('Target covariance:')
print(target_cov)
check_true('Mean is near zero', np.all(np.abs(post.mean(axis=0)) < 0.04))
check_true('Covariance is close to target', np.allclose(emp_cov, target_cov, atol=0.04))
print('Takeaway: each Gibbs step leaves the joint distribution invariant, so repeated conditional sampling builds an MCMC chain.')

What to Review After Finishing

  • Transition matrices — can you verify stochasticity and compute nn-step distributions?
  • State classification — can you identify communicating classes and use the fundamental matrix?
  • Stationary distributions — can you use all three methods (linear system, power iteration, eigenvector)?
  • Detailed balance — can you verify reversibility and derive stationary from DB equations?
  • Mixing time — can you compute spectral gap and understand the lazy chain trick?
  • Metropolis-Hastings — can you implement the acceptance step and verify detailed balance?
  • PageRank — can you build the Google matrix and interpret convergence rate?
  • HMM — can you implement forward algorithm, Viterbi, and backward algorithm?

References

  1. Levin, Peres, Wilmer — Markov Chains and Mixing Times (2017)
  2. Norris — Markov Chains (Cambridge, 1997)
  3. Bishop — Pattern Recognition and Machine Learning, Ch. 13 (HMMs)
  4. Roberts & Rosenthal — 'Optimal Scaling for Various Metropolis-Hastings Algorithms' (2001)
  5. Page et al. — 'The PageRank Citation Ranking: Bringing Order to the Web' (1998)
  6. Sutton & Barto — Reinforcement Learning: An Introduction (2018)
  • Absorbing chains - can you explain the probabilistic calculation and the ML relevance?
  • Gibbs sampling - can you connect the computation to model behavior?