Exercises Notebook

Converted from exercises.ipynb for web reading.

Entropy — Exercises

8 exercises covering Shannon entropy, MaxEnt, Huffman coding, entropy rate, differential entropy, Rényi entropy, and MaxEnt RL.

Format	Description
Problem	Markdown cell with task description
Your Solution	Code cell with scaffolding
Solution	Code cell with reference solution and checks

Difficulty Levels

Level	Exercises	Focus
★	1–3	Core entropy mechanics
★★	4–6	Deeper theory: entropy rate, Huffman, differential
★★★	7–8	AI/ML applications

Topic Map

Topic	Exercises
Shannon entropy, binary entropy function	1
Maximum entropy / Lagrange multipliers	2
Joint, conditional entropy, chain rule	3
Entropy rate of Markov chains	4
Huffman coding, source coding theorem	5
Differential entropy, Gaussian MaxEnt	6
Rényi entropy family	7
MaxEnt RL entropy bonus	8

Code cell 2

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

try:
    import seaborn as sns
    sns.set_theme(style="whitegrid", palette="colorblind")
    HAS_SNS = True
except ImportError:
    plt.style.use("seaborn-v0_8-whitegrid")
    HAS_SNS = False

mpl.rcParams.update({
    "figure.figsize":    (10, 6),
    "figure.dpi":         120,
    "font.size":           13,
    "axes.titlesize":      15,
    "axes.labelsize":      13,
    "xtick.labelsize":     11,
    "ytick.labelsize":     11,
    "legend.fontsize":     11,
    "legend.framealpha":   0.85,
    "lines.linewidth":      2.0,
    "axes.spines.top":     False,
    "axes.spines.right":   False,
    "savefig.bbox":       "tight",
    "savefig.dpi":         150,
})
np.random.seed(42)
print("Plot setup complete.")

Code cell 3

import numpy as np
from scipy.special import entr
import heapq

try:
    import matplotlib.pyplot as plt
    HAS_MPL = True
except ImportError:
    HAS_MPL = False

np.set_printoptions(precision=6, suppress=True)
np.random.seed(42)

def header(title):
    print('\n' + '=' * len(title))
    print(title)
    print('=' * len(title))

def check_close(name, got, expected, tol=1e-6):
    ok = np.allclose(got, expected, atol=tol, rtol=tol)
    print(f"{'PASS' if ok else 'FAIL'} — {name}")
    if not ok:
        print(f'  expected: {expected}')
        print(f'  got:      {got}')
    return ok

def check_true(name, cond):
    print(f"{'PASS' if cond else 'FAIL'} — {name}")
    return cond

def softmax(logits, T=1.0):
    logits = np.asarray(logits, dtype=float) / T
    logits = logits - logits.max()
    e = np.exp(logits)
    return e / e.sum()

def H(p):
    """Shannon entropy in nats."""
    return np.sum(entr(np.asarray(p, dtype=float)))

def H_bits(p):
    """Shannon entropy in bits."""
    return H(p) / np.log(2)

print('Setup complete. Helper functions ready.')

Exercise 1 ★ — Binary Entropy Function and Token Distributions

Part (a). Compute $H(X)$ in bits for:

(i) Fair coin: $p = (0.5, 0.5)$
(ii) Biased coin: $p = (0.1, 0.9)$
(iii) Fair 6-sided die: $p_i = 1/6$ for $i=1,\ldots,6$
(iv) Distribution $(0.5, 0.25, 0.125, 0.125)$

Part (b). Implement the binary entropy function $h(p) = -p\log_2 p - (1-p)\log_2(1-p)$ and verify $h(0) = h(1) = 0$ , $h(0.5) = 1$ bit.

Part (c). A language model outputs a distribution over $n = 32{,}000$ tokens. Compute the maximum possible entropy in nats. Then compute entropy when the top token has probability $0.95$ and the remaining $31{,}999$ tokens share the remaining probability uniformly.

Code cell 5

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 6

# Solution
# Exercise 1: Solution

def compute_entropy_bits_sol(p):
    return H_bits(p)

def binary_entropy_sol(p):
    if p <= 0 or p >= 1:
        return 0.0
    return -p*np.log2(p) - (1-p)*np.log2(1-p)

def lm_entropy_sol(n_vocab, top_prob):
    rest_prob = (1 - top_prob) / (n_vocab - 1)
    p = np.concatenate([[top_prob], np.full(n_vocab-1, rest_prob)])
    return H(p)  # in nats

header('Exercise 1: Binary Entropy and Token Distributions')

distributions = {
    'fair_coin':   (np.array([0.5, 0.5]),   1.0),
    'biased_coin': (np.array([0.1, 0.9]),   None),
    'fair_die':    (np.ones(6)/6,            np.log2(6)),
    'categorical': (np.array([0.5,0.25,0.125,0.125]), 1.75),
}
for name, (p, expected) in distributions.items():
    h = compute_entropy_bits_sol(p)
    print(f'  {name}: H = {h:.4f} bits' + (f' (expected {expected:.4f})' if expected else ''))

print()
check_close('h(0) = 0', binary_entropy_sol(1e-10), 0.0, tol=0.01)
check_close('h(1) = 0', binary_entropy_sol(1-1e-10), 0.0, tol=0.01)
check_close('h(0.5) = 1 bit', binary_entropy_sol(0.5), 1.0)

print()
H_max = np.log(32000)
H_top95 = lm_entropy_sol(32000, 0.95)
print(f'  Max entropy (uniform 32k): {H_max:.4f} nats')
print(f'  H (top token p=0.95):      {H_top95:.4f} nats')
check_true('LM entropy with top=0.95 << log(32000)', H_top95 < H_max/2)

print('\nTakeaway: entropy decreases rapidly when one token dominates. '
      'A model placing 95% probability on one token has very low entropy — '
      'it is essentially deterministic.')

Exercise 2 ★ — Maximum Entropy via Lagrange Multipliers

Part (a). Prove numerically that the uniform distribution $p_i = 1/n$ maximizes $H(\mathbf{p})$ subject to $\sum_i p_i = 1$ , $p_i \ge 0$ . Run constrained optimization from 10 different random starting points and verify the optimizer always finds $p^* = (1/n, \ldots, 1/n)$ .

Part (b). Find the maximum entropy distribution over $\{0, 1, 2, \ldots, 9\}$ subject to $\mathbb{E}[X] = \mu = 3$ . Implement the Lagrange multiplier solution: $p^*(k) \propto e^{-\lambda k}$ and find $\lambda$ such that $\mathbb{E}_{p^*}[X] = 3$ .

Part (c). Verify the MaxEnt inequality: compare your solution's entropy to 5 other distributions satisfying the same mean constraint and confirm yours has the highest entropy.

Code cell 8

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 9

# Solution
# Exercise 2: Solution

from scipy.optimize import minimize, brentq

def maxent_uniform_verify_sol(n, n_starts=10):
    results = []
    for seed in range(n_starts):
        np.random.seed(seed)
        x0 = np.random.dirichlet(np.ones(n))
        res = minimize(lambda p: -H(p), x0,
                       method='SLSQP',
                       bounds=[(1e-10,1)]*n,
                       constraints={'type':'eq','fun': lambda p: p.sum()-1})
        results.append(res.x)
    return results

def find_lambda_sol(mu, support):
    def mean_diff(lam):
        logits = -lam * support
        p = np.exp(logits - logits.max())
        p /= p.sum()
        return np.dot(p, support) - mu
    return brentq(mean_diff, -10, 10)

header('Exercise 2: Maximum Entropy via Lagrange Multipliers')

# Part (a)
n = 5
optimal_dists = maxent_uniform_verify_sol(n)
uniform_target = np.ones(n)/n
all_uniform = all(np.allclose(d, uniform_target, atol=1e-4) for d in optimal_dists)
check_true('All 10 starts converge to uniform distribution', all_uniform)
print(f'  Optimal entropy: {H_bits(optimal_dists[0]):.4f} bits = log2({n}) = {np.log2(n):.4f}')

# Part (b)
mu_target = 3.0
support = np.arange(10)
lam = find_lambda_sol(mu_target, support)
logits = -lam * support
p_maxent = np.exp(logits - logits.max())
p_maxent /= p_maxent.sum()
mean_check = np.dot(p_maxent, support)
check_close('E[X] = 3 for MaxEnt distribution', mean_check, mu_target)
print(f'  lambda = {lam:.4f}')
print(f'  p_maxent = {p_maxent.round(4)}')
print(f'  H(p_maxent) = {H(p_maxent):.4f} nats')

# Part (c): compare to alternatives with same mean
alt_dists = [
    ('Geometric-like',  lambda: (lambda p: p/p.sum())(np.array([0.3, 0.25, 0.2, 0.12, 0.06, 0.04, 0.02, 0.005, 0.005, 0.0]))),
    ('Concentrated',    lambda: (lambda p: p/p.sum())(np.array([0.1, 0.1, 0.1, 0.5, 0.1, 0.05, 0.03, 0.01, 0.01, 0.0]))),
]
print('\nComparison with same E[X]=3 constraint:')
print(f'  MaxEnt: H = {H(p_maxent):.4f} nats')
all_less = True
for name, dist_fn in alt_dists:
    p_alt = dist_fn()
    mean_alt = np.dot(p_alt, support)
    h_alt = H(p_alt)
    is_less = h_alt <= H(p_maxent) + 0.01
    if not is_less: all_less = False
    print(f'  {name}: mean={mean_alt:.2f}, H={h_alt:.4f} (MaxEnt wins: {is_less})')
check_true('MaxEnt distribution has highest entropy', all_less)

print('\nTakeaway: among all distributions with fixed mean E[X]=3, the exponential-family '
      'solution p ∝ exp(-lambda*k) has maximum entropy — the MaxEnt principle.')

Exercise 3 ★ — Joint Entropy, Conditional Entropy, Chain Rule

Part (a). Given joint distribution:

p(0,0)=0.3,\quad p(0,1)=0.1,\quad p(1,0)=0.2,\quad p(1,1)=0.4

Compute $H(X)$ , $H(Y)$ , $H(X,Y)$ , $H(X \mid Y)$ , $H(Y \mid X)$ , and $I(X;Y)$ .

Part (b). Verify the chain rule: $H(X,Y) = H(X) + H(Y \mid X) = H(Y) + H(X \mid Y)$ .

Part (c). Modify the joint distribution so that $X \perp\!\!\!\perp Y$ (independent). Verify that $I(X;Y) = 0$ and $H(X,Y) = H(X) + H(Y)$ .

Code cell 11

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 12

# Solution
# Exercise 3: Solution

def entropy_quantities_sol(joint):
    joint = np.asarray(joint, dtype=float)
    pX = joint.sum(axis=1)
    pY = joint.sum(axis=0)
    HX  = H(pX)
    HY  = H(pY)
    HXY = H(joint.ravel())
    HXGY = HXY - HY  # H(X|Y)
    HYGX = HXY - HX  # H(Y|X)
    IXY  = HX + HY - HXY
    return HX, HY, HXY, HXGY, HYGX, IXY

header('Exercise 3: Joint and Conditional Entropy')

# Part (a)
joint = np.array([[0.3, 0.1], [0.2, 0.4]])
HX, HY, HXY, HXGY, HYGX, IXY = entropy_quantities_sol(joint)
print(f'  H(X)     = {HX:.4f} nats')
print(f'  H(Y)     = {HY:.4f} nats')
print(f'  H(X,Y)   = {HXY:.4f} nats')
print(f'  H(X|Y)   = {HXGY:.4f} nats')
print(f'  H(Y|X)   = {HYGX:.4f} nats')
print(f'  I(X;Y)   = {IXY:.4f} nats')

# Part (b): Chain rule
check_close('H(X,Y) = H(X) + H(Y|X)', HXY, HX + HYGX)
check_close('H(X,Y) = H(Y) + H(X|Y)', HXY, HY + HXGY)
check_true('Subadditivity: H(X,Y) <= H(X)+H(Y)', HXY <= HX + HY + 1e-10)
check_true('I(X;Y) >= 0', IXY >= -1e-10)

# Part (c): Independent version
pX_marg = joint.sum(axis=1)
pY_marg = joint.sum(axis=0)
joint_indep = np.outer(pX_marg, pY_marg)
HX_i, HY_i, HXY_i, HXGY_i, HYGX_i, IXY_i = entropy_quantities_sol(joint_indep)
print()
check_close('I(X;Y) = 0 for independent', IXY_i, 0.0, tol=1e-8)
check_close('H(X,Y) = H(X)+H(Y) for independent', HXY_i, HX_i + HY_i)

print('\nTakeaway: H(X|Y) = H(X) - I(X;Y). Conditioning reduces entropy by the '
      'mutual information. When X and Y are independent, conditioning helps nothing.')

Exercise 4 ★★ — Entropy Rate of a Markov Chain

Consider a 3-state Markov chain with transition matrix:

P = \begin{pmatrix} 0.7 & 0.2 & 0.1 \\ 0.3 & 0.4 & 0.3 \\ 0.1 & 0.3 & 0.6 \end{pmatrix}

Part (a). Find the stationary distribution $\boldsymbol{\mu}$ satisfying $\boldsymbol{\mu}^\top P = \boldsymbol{\mu}^\top$ and $\sum_i \mu_i = 1$ .

Part (b). Compute the entropy rate $\mathcal{H} = -\sum_{x} \mu_x \sum_{y} P_{xy}\log P_{xy}$ in nats.

Part (c). Compare to the i.i.d. rate $H(\boldsymbol{\mu})$ . Explain why the Markov chain rate is lower.

Part (d). Simulate 10,000 steps and verify that the rolling perplexity $e^{\hat{\mathcal{H}}}$ converges to the true perplexity.

Code cell 14

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 15

# Solution
# Exercise 4: Solution

from scipy.special import entr

P = np.array([[0.7, 0.2, 0.1],
              [0.3, 0.4, 0.3],
              [0.1, 0.3, 0.6]])

def stationary_distribution_sol(P):
    eigvals, eigvecs = np.linalg.eig(P.T)
    idx = np.argmin(np.abs(eigvals - 1.0))
    mu = np.real(eigvecs[:, idx])
    return mu / mu.sum()

def markov_entropy_rate_sol(P, mu):
    H_per_state = np.array([np.sum(entr(P[i])) for i in range(len(P))])
    return np.dot(mu, H_per_state)

def header(t): print('\n'+'='*len(t)+'\n'+t+'\n'+'='*len(t))
def check_close(n,g,e,tol=1e-6):
    ok=np.allclose(g,e,atol=tol,rtol=tol)
    print(f"{'PASS' if ok else 'FAIL'} — {n}")
    return ok
def check_true(n,c): print(f"{'PASS' if c else 'FAIL'} — {n}"); return c

header('Exercise 4: Entropy Rate of a 3-State Markov Chain')

# Part (a)
mu = stationary_distribution_sol(P)
print(f'Stationary distribution: {mu.round(4)}')
check_close('mu @ P = mu', mu @ P, mu)
check_close('sum(mu) = 1', mu.sum(), 1.0)

# Part (b)
H_rate = markov_entropy_rate_sol(P, mu)
print(f'\nEntropy rate: H = {H_rate:.4f} nats')
print(f'Perplexity:   PPL = {np.exp(H_rate):.4f}')

# Part (c)
H_iid = np.sum(entr(mu))
print(f'i.i.d. rate:  H = {H_iid:.4f} nats')
check_true('Markov entropy rate < i.i.d. rate', H_rate < H_iid)
print(f'Reduction:    {H_iid - H_rate:.4f} nats (temporal structure reduces uncertainty)')

# Part (d): Simulation
np.random.seed(42)
T = 10000
state = np.random.choice(3, p=mu)
nll_log = []
for _ in range(T):
    next_state = np.random.choice(3, p=P[state])
    nll_log.append(-np.log(P[state, next_state]))
    state = next_state

ppl_estimate = np.exp(np.mean(nll_log))
ppl_true = np.exp(H_rate)
error_pct = abs(ppl_estimate - ppl_true) / ppl_true * 100

check_close('Simulated PPL ≈ true PPL', ppl_estimate, ppl_true, tol=0.1)
print(f'True PPL: {ppl_true:.4f}, Simulated PPL ({T} steps): {ppl_estimate:.4f}, Error: {error_pct:.2f}%')

print('\nTakeaway: Markov chain entropy rate is always <= i.i.d. rate because temporal '
      'structure makes the next state predictable given the current state. '
      'LLMs exploit this long-range structure to achieve low perplexity.')

Exercise 5 ★★ — Huffman Coding and Shannon Source Coding

Part (a). Given source distribution $p(A)=0.4, p(B)=0.3, p(C)=0.2, p(D)=0.1$ :

Compute $H(X)$ in bits
Construct the optimal Huffman code
Verify $H(X) \le \bar{L} < H(X) + 1$

Part (b). Implement huffman_code(symbols, probs) returning a dict of codewords.

Part (c). For the dyadic distribution $p(x_k) = 2^{-k}$ for $k = 1,\ldots,n-1$ , $p(x_n) = 2^{-(n-1)}$ with $n=5$ : show that Huffman coding achieves $\bar{L} = H(X)$ exactly.

Code cell 17

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 18

# Solution
# Exercise 5: Solution

import heapq

def huffman_code_sol(symbols, probs):
    heap = [[p, [s, '']] for s, p in zip(symbols, probs)]
    heapq.heapify(heap)
    while len(heap) > 1:
        lo = heapq.heappop(heap)
        hi = heapq.heappop(heap)
        for pair in lo[1:]: pair[1] = '0' + pair[1]
        for pair in hi[1:]: pair[1] = '1' + pair[1]
        heapq.heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
    return {pair[0]: pair[1] for pair in sorted(heap[0][1:])}

def header(t): print('\n'+'='*len(t)+'\n'+t+'\n'+'='*len(t))
def check_close(n,g,e,tol=1e-6): ok=np.allclose(g,e,atol=tol,rtol=tol); print(f"{'PASS' if ok else 'FAIL'} — {n}"); return ok
def check_true(n,c): print(f"{'PASS' if c else 'FAIL'} — {n}"); return c

header('Exercise 5: Huffman Coding and Source Coding Theorem')

# Part (a) & (b)
symbols = ['A', 'B', 'C', 'D']
probs   = [0.4, 0.3, 0.2, 0.1]
code = huffman_code_sol(symbols, probs)

H_X  = sum(-p * np.log2(p) for p in probs)
Lbar = sum(probs[i] * len(code[s]) for i, s in enumerate(symbols))

print('Huffman code:')
for s, p in zip(symbols, probs):
    cw = code[s]
    print(f'  {s}: p={p:.2f}, code={cw}, len={len(cw)}, ideal={-np.log2(p):.2f}')
print(f'H(X)  = {H_X:.4f} bits')
print(f'L_bar = {Lbar:.4f} bits')

check_true('H(X) <= L_bar', H_X <= Lbar + 1e-10)
check_true('L_bar < H(X) + 1', Lbar < H_X + 1 + 1e-10)

# Part (c): Dyadic distribution achieves H(X) exactly
n = 5
probs_dyadic = [2**(-k) for k in range(1, n)]
probs_dyadic.append(2**(-(n-1)))  # last term same as second-to-last
symbols_dyadic = [f'x{k}' for k in range(1, n+1)]
code_dyadic = huffman_code_sol(symbols_dyadic, probs_dyadic)

H_dyadic  = sum(-p * np.log2(p) for p in probs_dyadic)
Lbar_dyadic = sum(probs_dyadic[i] * len(code_dyadic[s]) for i, s in enumerate(symbols_dyadic))
print(f'\nDyadic distribution (n={n}):')
print(f'  H(X) = {H_dyadic:.4f} bits')
print(f'  L_bar = {Lbar_dyadic:.4f} bits')
check_close('Dyadic: L_bar = H(X) exactly', Lbar_dyadic, H_dyadic, tol=0.01)

print('\nTakeaway: Huffman coding always satisfies H <= L_bar < H+1. '
      'For dyadic distributions (p = 2^-k), it achieves H exactly. '
      'Arithmetic coding removes the rounding loss for non-dyadic distributions.')

Exercise 6 ★★ — Differential Entropy and Gaussian MaxEnt

Part (a). Compute the differential entropy $h(X)$ for:

(i) $\mathcal{U}(0,1)$ , (ii) $\mathcal{U}(0,2)$ , (iii) $\mathcal{U}(0,0.5)$
(iv) $\mathcal{N}(0,1)$ , (v) $\mathcal{N}(0,4)$
(vi) $\operatorname{Exp}(1)$

Verify that $\mathcal{U}(0,0.5)$ has negative differential entropy.

Part (b). Prove numerically that $h(aX) = h(X) + \log|a|$ for $a=2$ and $X \sim \mathcal{N}(0,1)$ .

Part (c). Numerically verify the Gaussian MaxEnt theorem: show that for $\sigma^2 = 2$ , the Gaussian has strictly higher differential entropy than the Laplace and uniform distributions with the same variance.

Code cell 20

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 21

# Solution
# Exercise 6: Solution

from scipy.stats import norm, laplace
from scipy.special import entr

def diff_entropy_gaussian_sol(mu, sigma):
    return 0.5 * np.log(2 * np.pi * np.e * sigma**2)

def diff_entropy_uniform_sol(a, b):
    return np.log(b - a)

def diff_entropy_exp_sol(lam):
    return 1 - np.log(lam)

def header(t): print('\n'+'='*len(t)+'\n'+t+'\n'+'='*len(t))
def check_close(n,g,e,tol=1e-4): ok=np.allclose(g,e,atol=tol,rtol=tol); print(f"{'PASS' if ok else 'FAIL'} — {n}"); return ok
def check_true(n,c): print(f"{'PASS' if c else 'FAIL'} — {n}"); return c

header('Exercise 6: Differential Entropy and Gaussian MaxEnt')

# Part (a)
cases = [
    ('Uniform(0,1)',   diff_entropy_uniform_sol(0, 1),       0.0),
    ('Uniform(0,2)',   diff_entropy_uniform_sol(0, 2),       np.log(2)),
    ('Uniform(0,0.5)', diff_entropy_uniform_sol(0, 0.5),    np.log(0.5)),
    ('Normal(0,1)',    diff_entropy_gaussian_sol(0, 1),      0.5*np.log(2*np.pi*np.e)),
    ('Normal(0,4)',    diff_entropy_gaussian_sol(0, 2),      0.5*np.log(2*np.pi*np.e*4)),
    ('Exp(1)',         diff_entropy_exp_sol(1),               1.0),
]
for name, h_val, h_expected in cases:
    print(f'  {name}: h = {h_val:.4f}' + (' [NEGATIVE]' if h_val < 0 else ''))
    check_close(f'h({name})', h_val, h_expected)

# Part (b): h(aX) = h(X) + log|a|
sigma = 1.0; a = 2.0
h_X  = diff_entropy_gaussian_sol(0, sigma)
h_aX = diff_entropy_gaussian_sol(0, a * sigma)
check_close('h(aX) = h(X) + log|a|', h_aX, h_X + np.log(abs(a)))

# Part (c): Gaussian MaxEnt under variance constraint
sigma_sq = 2.0; sigma_val = np.sqrt(sigma_sq)
h_gaussian = diff_entropy_gaussian_sol(0, sigma_val)
# Laplace with same variance: Var = 2b^2, so b = sqrt(sigma^2/2)
b_laplace = np.sqrt(sigma_sq / 2)
h_laplace = 1 + np.log(2 * b_laplace)
# Uniform with same variance: Var = (b-a)^2/12, so b-a = sqrt(12*sigma^2)
width = np.sqrt(12 * sigma_sq)
h_uniform = np.log(width)

print(f'\nVariance = {sigma_sq}:')
print(f'  h(Gaussian) = {h_gaussian:.4f} nats')
print(f'  h(Laplace)  = {h_laplace:.4f} nats')
print(f'  h(Uniform)  = {h_uniform:.4f} nats')
check_true('Gaussian > Laplace (same var)', h_gaussian > h_laplace)
check_true('Gaussian > Uniform (same var)', h_gaussian > h_uniform)

print('\nTakeaway: The Gaussian maximizes differential entropy under variance constraint. '
      'This makes it the "most uncertain" distribution given only a variance budget, '
      'justifying Gaussian priors in Bayesian ML and Gaussian noise in diffusion models.')

Exercise 7 ★★★ — Rényi Entropy Family

Part (a). Implement renyi_entropy(p, alpha) that computes $H_\alpha(X)$ in nats for any $\alpha \ge 0$ (including the limiting cases $\alpha \to 0$ , $\alpha \to 1$ , $\alpha \to \infty$ ).

Part (b). For the distribution $\mathbf{p} = (0.5, 0.3, 0.15, 0.05)$ , compute $H_\alpha$ for $\alpha \in \{0, 0.5, 1, 2, 5, \infty\}$ . Verify the ordering $H_0 \ge H_1 \ge H_2 \ge H_\infty$ .

Part (c). Compute the min-entropy $H_\infty = -\log \max_x p(x)$ directly and verify it matches your implementation as $\alpha \to \infty$ .

Part (d). Compute the guessing advantage for this distribution: $P(\text{correct in one guess}) = \max_x p(x) = 2^{-H_\infty}$ .

Code cell 23

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 24

# Solution
# Exercise 7: Solution

from scipy.special import entr

def renyi_entropy_sol(p, alpha):
    p = np.asarray(p, dtype=float)
    p = p[p > 0]  # remove zeros (don't contribute)
    if np.isclose(alpha, 1.0):
        return np.sum(entr(p))
    if np.isinf(alpha):
        return -np.log(np.max(p))
    if np.isclose(alpha, 0.0):
        return np.log(len(p))
    return (1.0 / (1.0 - alpha)) * np.log(np.sum(p ** alpha))

def header(t): print('\n'+'='*len(t)+'\n'+t+'\n'+'='*len(t))
def check_close(n,g,e,tol=1e-6): ok=np.allclose(g,e,atol=tol,rtol=tol); print(f"{'PASS' if ok else 'FAIL'} — {n}"); return ok
def check_true(n,c): print(f"{'PASS' if c else 'FAIL'} — {n}"); return c

header('Exercise 7: Rényi Entropy Family')

p = np.array([0.5, 0.3, 0.15, 0.05])
alphas = [0, 0.5, 1.0, 2.0, 5.0, np.inf]
names  = ['H_0 (Hartley)', 'H_0.5', 'H_1 (Shannon)', 'H_2 (Collision)', 'H_5', 'H_inf (Min)']

# Part (b)
H_vals = [renyi_entropy_sol(p, a) for a in alphas]
print('Rényi entropies for p = (0.5, 0.3, 0.15, 0.05):')
for a, h, name in zip(alphas, H_vals, names):
    a_str = 'inf' if np.isinf(a) else str(a)
    print(f'  alpha={a_str:>4}: H = {h:.4f} nats  [{name}]')

# Verify ordering H_0 >= H_0.5 >= H_1 >= H_2 >= H_5 >= H_inf
monotone = all(H_vals[i] >= H_vals[i+1] - 1e-10 for i in range(len(H_vals)-1))
check_true('H_0 >= H_0.5 >= H_1 >= H_2 >= H_5 >= H_inf', monotone)

# Part (c): Min-entropy direct
H_inf_direct = -np.log(p.max())
check_close('H_inf direct = renyi(p, inf)', H_inf_direct, renyi_entropy_sol(p, np.inf))

# Part (d): Guessing advantage
guessing_advantage = p.max()
guessing_from_minH = np.exp(-renyi_entropy_sol(p, np.inf))
print(f'\nGuessing advantage: max_x p(x) = {guessing_advantage:.4f}')
print(f'From min-entropy:   exp(-H_inf) = {guessing_from_minH:.4f}')
check_close('Guessing advantage = exp(-H_inf)', guessing_advantage, guessing_from_minH)

# Bonus: Uniform vs concentrated distribution comparison
p_uniform = np.ones(4) / 4
print(f'\nUniform (4): all H_alpha = log(4) = {np.log(4):.4f}')
for a in [0, 1.0, 2.0, np.inf]:
    h_u = renyi_entropy_sol(p_uniform, a)
    h_p = renyi_entropy_sol(p, a)
    a_str = 'inf' if np.isinf(a) else str(a)
    print(f'  alpha={a_str}: uniform={h_u:.4f}, concentrated={h_p:.4f}')

print('\nTakeaway: Rényi entropy interpolates between Hartley (pure support size), '
      'Shannon (expected surprise), and min-entropy (worst-case surprise). '
      'Min-entropy is the cryptographic metric: high H_inf = hard to guess.')

Exercise 8 ★★★ — MaxEnt RL: Entropy Bonus in Policy Optimization

This exercise implements a simplified version of the soft policy gradient underlying SAC.

Part (a). Implement train_bandit(r, alpha, lr, steps) for a 5-action bandit with reward vector $\mathbf{r} = (0.8, 0.6, 0.4, 0.2, 0.0)$ . The objective is $J(\pi) = \mathbb{E}_\pi[r] + \alpha H(\pi)$ .

Part (b). Train for $\alpha \in \{0, 0.1, 0.5, 1.0\}$ and $500$ steps. For each $\alpha$ , verify the final policy satisfies $\pi^*(a) \propto e^{r(a)/\alpha}$ (the MaxEnt optimal policy).

Part (c). Show that as $\alpha \to 0$ , the optimal policy converges to greedy (argmax); as $\alpha \to \infty$ , it converges to uniform.

Part (d). Plot policy entropy over training for all $\alpha$ values.

Code cell 26

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 27

# Solution
# Exercise 8: Solution

from scipy.special import entr
import matplotlib.pyplot as plt

r = np.array([0.8, 0.6, 0.4, 0.2, 0.0])
n_actions = len(r)

def softmax_vec(logits):
    e = np.exp(logits - logits.max())
    return e / e.sum()

def train_bandit_sol(r, alpha=0.0, lr=0.1, steps=500, seed=0):
    np.random.seed(seed)
    theta = np.zeros(len(r))
    entropies = []
    for _ in range(steps):
        pi = softmax_vec(theta)
        H_pi = np.sum(entr(pi))
        # Soft Q-value: r(a) + alpha * (-log pi(a))
        if alpha > 0:
            q = r + alpha * (-np.log(np.maximum(pi, 1e-15)))
        else:
            q = r
        baseline = np.dot(pi, q)
        grad = pi * (q - baseline)  # policy gradient
        theta += lr * grad
        entropies.append(H_pi)
    return np.array(entropies), softmax_vec(theta)

def header(t): print('\n'+'='*len(t)+'\n'+t+'\n'+'='*len(t))
def check_close(n,g,e,tol=1e-2): ok=np.allclose(g,e,atol=tol,rtol=tol); print(f"{'PASS' if ok else 'FAIL'} — {n}"); return ok
def check_true(n,c): print(f"{'PASS' if c else 'FAIL'} — {n}"); return c

header('Exercise 8: MaxEnt RL — Entropy Bonus in Policy Optimization')

alphas = [0.0, 0.1, 0.5, 1.0]
colors = ['#CC3311', '#EE7733', '#009988', '#0077BB']
results = {}

fig, axes = plt.subplots(1, 2, figsize=(13, 5))

for alpha, color in zip(alphas, colors):
    entropies, final_pi = train_bandit_sol(r, alpha=alpha)
    results[alpha] = final_pi
    axes[0].plot(entropies, color=color, alpha=0.8, label=f'alpha={alpha}')

# Part (b): Verify optimal policy = softmax(r/alpha)
print('Final policies and MaxEnt optimal verification:')
print(f"{'alpha':>6} {'Final policy':>45} {'matches softmax(r/a)':>22}")
print('-' * 80)
for alpha in alphas:
    final = results[alpha]
    if alpha > 0:
        pi_opt = softmax_vec(r / alpha)
        match = np.allclose(final, pi_opt, atol=0.05)
    else:
        # alpha=0: should be near-greedy
        match = final.argmax() == r.argmax()
    print(f"{alpha:>6.1f} {str(final.round(3)):>45} {str(match):>22}")

# Part (c): Limiting behavior
_, pi_greedy  = train_bandit_sol(r, alpha=1e-4, steps=2000)
_, pi_uniform = train_bandit_sol(r, alpha=1e4,  steps=2000)

check_true('alpha->0: policy ≈ greedy (argmax=0)', pi_greedy.argmax() == 0)
check_close('alpha->inf: policy ≈ uniform', pi_uniform, np.ones(n_actions)/n_actions, tol=0.05)

axes[0].set_title('Policy entropy during MaxEnt RL training')
axes[0].set_xlabel('Training step')
axes[0].set_ylabel('$H(\\pi)$ (nats)')
axes[0].legend()

# Final policy bar chart
x = np.arange(n_actions)
width = 0.2
for i, (alpha, color) in enumerate(zip(alphas, colors)):
    axes[1].bar(x + i*width, results[alpha], width, color=color,
               label=f'alpha={alpha}', alpha=0.85)
axes[1].set_title('Final policy for each alpha')
axes[1].set_xlabel('Action')
axes[1].set_ylabel('$\\pi(a)$')
axes[1].set_xticks(x + width*1.5)
axes[1].set_xticklabels([f'a{i}\nr={ri}' for i, ri in enumerate(r)])
axes[1].legend()
fig.tight_layout()
plt.show()

print('\nTakeaway: Entropy regularization (alpha > 0) prevents greedy collapse. '
      'The optimal MaxEnt RL policy is pi^*(a) ∝ exp(r(a)/alpha) — '
      'a Boltzmann/softmax distribution. Higher alpha = more exploration = '
      'higher entropy = better coverage of suboptimal actions. '
      'This is the foundation of SAC in robotics and RLHF in LLMs.')

What to Review After Finishing

Ex 1 — compute entropy for all standard distributions; understand $h(p)$ shape
Ex 2 — derive MaxEnt distribution via Lagrange multipliers from scratch
Ex 3 — verify chain rule $H(X,Y) = H(X) + H(Y\mid X)$ numerically
Ex 4 — implement stationary distribution and entropy rate for Markov chains
Ex 5 — implement Huffman coding; verify source coding theorem bound
Ex 6 — compute differential entropy; prove Gaussian MaxEnt numerically
Ex 7 — implement Rényi family; understand min-entropy as guessing metric
Ex 8 — implement MaxEnt policy gradient; verify convergence to softmax optimal

References

Cover & Thomas (2006) — Elements of Information Theory, Chapters 1-5
MacKay (2003) — Information Theory, Inference, and Learning Algorithms, Chapters 1-5
Haarnoja et al. (2018) — Soft Actor-Critic. arXiv:1801.01290
Jaynes (1957) — Information Theory and Statistical Mechanics. Physical Review 106(4)

Exercise 9: Huffman Length Versus Entropy

Compute entropy and expected Huffman code length for a small source. Verify the source-coding inequality $H(X) \leq L < H(X)+1$ in bits.

Code cell 30

# Your Solution
print("Compare entropy with an expected prefix-code length.")

Code cell 31

# Solution
header("Exercise 9: Huffman Length Versus Entropy")
p = np.array([0.4, 0.25, 0.2, 0.15])
lengths = np.array([1, 2, 3, 3])
L = float(np.sum(p * lengths))
Hb = H_bits(p)
print("entropy bits:", round(Hb, 6))
print("expected length:", round(L, 6))
check_true("source-coding bound", Hb <= L < Hb + 1)
print("Takeaway: entropy is the ideal expected code length; prefix codes pay less than one extra bit.")

Exercise 10: Entropy Bonus in a Policy

For a softmax policy, increase the temperature and verify that policy entropy increases while the distribution becomes less peaked.

Code cell 33

# Your Solution
print("Compare policy entropy at two temperatures.")

Code cell 34

# Solution
header("Exercise 10: Entropy Bonus")
logits = np.array([3.0, 1.0, -0.5])
p_cold = softmax(logits, T=0.7)
p_warm = softmax(logits, T=2.0)
H_cold, H_warm = H(p_cold), H(p_warm)
print("cold policy:", np.round(p_cold, 4), "H=", round(float(H_cold), 6))
print("warm policy:", np.round(p_warm, 4), "H=", round(float(H_warm), 6))
check_true("higher temperature increases entropy", H_warm > H_cold)
print("Takeaway: entropy regularization encourages exploration by flattening the action distribution.")