Exercises Notebook

Converted from exercises.ipynb for web reading.

Bayesian Inference — Exercises

10 exercises covering conjugate Bayes, MAP estimation, posterior prediction, model comparison, sampling, variational ideas, and Bayesian decision-making.

Format	Description
Problem	Markdown cell with task description
Your Solution	Code cell with scaffolding
Solution	Code cell with reference solution and checks

Difficulty Levels

Level	Exercises	Focus
*	1-3	Core posterior mechanics
**	4-6	Prediction, model comparison, hierarchical shrinkage
***	7-10	Sampling, ELBO reasoning, Bayesian decision-making

Topic Map

Topic	Exercise
Beta-Binomial updating	1
Gaussian-Gaussian shrinkage	2
MAP as regularized MLE	3
Posterior predictive distributions	4
Bayes factors	5
Hierarchical Bayes	6
MCMC	7
Variational inference / ELBO	8
Thompson sampling / Bayesian action	9

Additional applied exercises: Exercise 9: Posterior predictive calibration.

Code cell 2

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

try:
    import seaborn as sns
    sns.set_theme(style="whitegrid", palette="colorblind")
    HAS_SNS = True
except ImportError:
    plt.style.use("seaborn-v0_8-whitegrid")
    HAS_SNS = False

mpl.rcParams.update({
    "figure.figsize":    (10, 6),
    "figure.dpi":         120,
    "font.size":           13,
    "axes.titlesize":      15,
    "axes.labelsize":      13,
    "xtick.labelsize":     11,
    "ytick.labelsize":     11,
    "legend.fontsize":     11,
    "legend.framealpha":   0.85,
    "lines.linewidth":      2.0,
    "axes.spines.top":     False,
    "axes.spines.right":   False,
    "savefig.bbox":       "tight",
    "savefig.dpi":         150,
})
np.random.seed(42)
print("Plot setup complete.")

Code cell 3

import numpy as np
import numpy.linalg as la

try:
    import matplotlib.pyplot as plt
    HAS_MPL = True
except ImportError:
    HAS_MPL = False

np.set_printoptions(precision=6, suppress=True)
np.random.seed(42)

def header(title):
    print("\n" + "=" * len(title))
    print(title)
    print("=" * len(title))

def check_close(name, got, expected, tol=1e-8):
    ok = np.allclose(got, expected, atol=tol, rtol=tol)
    print(f"{'PASS' if ok else 'FAIL'} — {name}")
    if not ok:
        print("  expected:", expected)
        print("  got     :", got)
    return ok

def check_true(name, cond):
    print(f"{'PASS' if cond else 'FAIL'} — {name}")
    return cond

# Plus any shared utility functions needed across exercises
from scipy import stats, special

def beta_posterior(alpha, beta, k, n):
    return alpha + k, beta + n - k

print("Exercise setup complete.")

Exercise 1 * — Beta-Binomial Posterior Update

A binary moderation pipeline flags 17 unsafe outputs out of 40 audited generations.

Task

(a) Start from a prior Beta(alpha, beta) with alpha=2, beta=5.
(b) Compute the posterior parameters after seeing the data.
(c) Return the posterior mean unsafe rate.
(d) Return a 95% equal-tail credible interval.

Code cell 5

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 6

# Solution
def beta_update(alpha, beta, k, n):
    a_post = alpha + k
    b_post = beta + n - k
    post_mean = a_post / (a_post + b_post)
    ci = stats.beta.ppf([0.025, 0.975], a_post, b_post)
    return a_post, b_post, post_mean, ci

a_post, b_post, post_mean, ci = beta_update(2, 5, 17, 40)

header("Exercise 1: Beta-Binomial Posterior Update")
print(f"Posterior parameters: Beta({a_post}, {b_post})")
print(f"Posterior mean: {post_mean:.6f}")
print(f"95% credible interval: {ci}")
check_close("posterior alpha", a_post, 19)
check_close("posterior beta", b_post, 28)
check_true("credible interval is inside [0, 1]", 0 <= ci[0] <= ci[1] <= 1)
print("\nTakeaway: Conjugate Bayes updates by adding pseudo-counts to observed counts.")

Exercise 2 * — Gaussian Prior Plus Gaussian Likelihood

You monitor the mean latency drift of a model. The historical belief is mu ~ N(0, 1.5^2). You observe four latency deltas with known observation standard deviation sigma=0.8:

data = [0.4, 0.1, 0.5, 0.2]

Task

(a) Compute the posterior mean.
(b) Compute the posterior variance.
(c) Verify that the posterior mean lies between the prior mean and the sample mean.

Code cell 8

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 9

# Solution
def gaussian_update(mu0, tau0, sigma, data):
    data = np.asarray(data, dtype=float)
    n = len(data)
    xbar = data.mean()
    post_var = 1.0 / (1.0 / tau0**2 + n / sigma**2)
    post_mean = post_var * (mu0 / tau0**2 + n * xbar / sigma**2)
    return post_mean, post_var

data = np.array([0.4, 0.1, 0.5, 0.2])
post_mean, post_var = gaussian_update(0.0, 1.5, 0.8, data)

header("Exercise 2: Gaussian-Gaussian Updating")
print(f"Posterior mean: {post_mean:.6f}")
print(f"Posterior var : {post_var:.6f}")
check_true("posterior mean is between prior mean and sample mean", 0.0 <= post_mean <= data.mean())
check_true("posterior variance is positive", post_var > 0)
print("\nTakeaway: Gaussian Bayesian updating is precision-weighted shrinkage.")

Exercise 3 * — MAP as Ridge Regression

Consider a linear model with Gaussian noise variance sigma2=0.25 and Gaussian prior theta ~ N(0, lambda^{-1} I) where lambda=2.0.

Task

(a) Compute the MAP estimator.
(b) Compare its norm to the OLS solution.
(c) Verify that MAP shrinks toward zero.

Code cell 11

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 12

# Solution
def ridge_map(X, y, lam, sigma2):
    return la.solve(X.T @ X / sigma2 + lam * np.eye(X.shape[1]), X.T @ y / sigma2)

X = np.array([[1.0, -1.0], [1.0, 0.0], [1.0, 1.0], [1.0, 2.0]])
y = np.array([0.2, 1.0, 1.8, 2.9])
theta_map = ridge_map(X, y, 2.0, 0.25)
theta_ols = la.lstsq(X, y, rcond=None)[0]

header("Exercise 3: MAP as Ridge")
print(f"OLS estimate : {theta_ols}")
print(f"MAP estimate : {theta_map}")
check_true("MAP has smaller or equal norm than OLS", la.norm(theta_map) <= la.norm(theta_ols) + 1e-12)
check_true("MAP is finite", np.isfinite(theta_map).all())
print("\nTakeaway: A Gaussian prior turns point estimation into regularized MAP optimization.")

Exercise 4 ** — Gamma-Poisson Posterior Predictive

An LLM service receives request counts per minute. Over 6 minutes, the counts are:

counts = [4, 3, 6, 5, 2, 7]

Assume X_i | lambda ~ Poisson(lambda) and lambda ~ Gamma(alpha=2, beta=1) with rate parameterization.

Task

(a) Compute the posterior parameters.
(b) Compute the posterior mean of lambda.
(c) Compute the posterior predictive mean for the next minute.

Code cell 14

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 15

# Solution
def gamma_poisson_update(alpha, beta, counts):
    counts = np.asarray(counts, dtype=float)
    alpha_post = alpha + counts.sum()
    beta_post = beta + len(counts)
    post_mean = alpha_post / beta_post
    pred_mean = post_mean
    return alpha_post, beta_post, post_mean, pred_mean

alpha_post, beta_post, post_mean, pred_mean = gamma_poisson_update(2.0, 1.0, [4, 3, 6, 5, 2, 7])

header("Exercise 4: Gamma-Poisson Posterior Predictive")
print(f"Posterior alpha: {alpha_post}")
print(f"Posterior beta : {beta_post}")
print(f"Posterior mean lambda: {post_mean:.6f}")
print(f"Predictive mean next count: {pred_mean:.6f}")
check_close("posterior alpha", alpha_post, 29.0)
check_close("posterior beta", beta_post, 7.0)
print("\nTakeaway: Posterior prediction averages over rate uncertainty instead of fixing one estimated rate.")

Exercise 5 ** — Bayes Factor vs p-Value

You observe data = [0.10, 0.25, 0.28, 0.34, 0.31, 0.22] from a Gaussian model with known sigma=0.25.

Compare:

H0: mu = 0
H1: mu ~ N(0, tau^2) with tau=0.3

Task

(a) Compute the z-score and two-sided p-value.
(b) Compute the Bayes factor BF10.
(c) Explain briefly why the two numbers answer different questions.

Code cell 17

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 18

# Solution
def bayes_factor_normal(data, sigma, tau):
    data = np.asarray(data, dtype=float)
    n = len(data)
    xbar = data.mean()
    z = xbar / (sigma / np.sqrt(n))
    p_value = 2 * stats.norm.sf(abs(z))
    p_h0 = stats.norm.pdf(xbar, loc=0.0, scale=sigma / np.sqrt(n))
    p_h1 = stats.norm.pdf(xbar, loc=0.0, scale=np.sqrt(sigma**2 / n + tau**2))
    bf10 = p_h1 / p_h0
    return z, p_value, bf10

data = np.array([0.10, 0.25, 0.28, 0.34, 0.31, 0.22])
z, p_value, bf10 = bayes_factor_normal(data, 0.25, 0.3)

header("Exercise 5: Bayes Factor vs p-Value")
print(f"z-score : {z:.6f}")
print(f"p-value : {p_value:.6f}")
print(f"BF10    : {bf10:.6f}")
check_true("Bayes factor is positive", bf10 > 0)
check_true("p-value lies in [0, 1]", 0 <= p_value <= 1)
print("\nTakeaway: A p-value measures surprise under H0, while a Bayes factor compares integrated evidence under H1 and H0.")

Exercise 6 ** — Hierarchical Partial Pooling

Four related groups have noisy sample means and sample sizes:

n_g = [5, 10, 40, 100]
ybar_g = [1.6, 0.8, 1.1, 1.0]

Assume observation variance sigma2=1.0, population mean mu0=1.0, and population variance tau2=0.25.

Task

(a) Compute the posterior mean for each group in the normal-normal hierarchical model.
(b) Verify that smaller groups are shrunk more toward mu0.

Code cell 20

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 21

# Solution
def partial_pooling_means(ybar_g, n_g, mu0, tau2, sigma2):
    ybar_g = np.asarray(ybar_g, dtype=float)
    n_g = np.asarray(n_g, dtype=float)
    weights = (n_g / sigma2) / (n_g / sigma2 + 1 / tau2)
    return weights * ybar_g + (1 - weights) * mu0

ybar_g = np.array([1.6, 0.8, 1.1, 1.0])
n_g = np.array([5, 10, 40, 100])
post_means = partial_pooling_means(ybar_g, n_g, 1.0, 0.25, 1.0)

header("Exercise 6: Hierarchical Partial Pooling")
print(f"Posterior means: {post_means}")
check_true("smallest group moves more than largest group", abs(post_means[0] - ybar_g[0]) > abs(post_means[-1] - ybar_g[-1]))
check_true("largest group stays near its raw mean", abs(post_means[-1] - ybar_g[-1]) < 0.01)
print("\nTakeaway: Partial pooling borrows strength, but it borrows most aggressively where data are sparse.")

Exercise 7 *** — Metropolis-Hastings for a Beta Posterior

Target posterior: theta | D ~ Beta(15, 7).

Task

(a) Implement a simple random-walk Metropolis-Hastings sampler on [0, 1].
(b) Estimate the posterior mean from the samples.
(c) Compare it to the exact Beta mean.

Code cell 23

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 24

# Solution
def mh_beta(a, b, n_steps=4000, proposal_std=0.08):
    chain = np.empty(n_steps)
    theta = 0.5
    for i in range(n_steps):
        proposal = theta + np.random.normal(0, proposal_std)
        if 0 < proposal < 1:
            log_acc = stats.beta.logpdf(proposal, a, b) - stats.beta.logpdf(theta, a, b)
            if np.log(np.random.rand()) < log_acc:
                theta = proposal
        chain[i] = theta
    return chain

chain = mh_beta(15, 7)
burn = 500
sample_mean = chain[burn:].mean()
exact_mean = 15 / (15 + 7)

header("Exercise 7: Metropolis-Hastings")
print(f"Sample mean: {sample_mean:.6f}")
print(f"Exact mean : {exact_mean:.6f}")
check_true("chain stays inside [0, 1]", np.all((chain >= 0) & (chain <= 1)))
check_true("sample mean is close to exact mean", abs(sample_mean - exact_mean) < 0.03)
print("\nTakeaway: MCMC approximates posterior expectations by sampling from an invariant chain instead of solving the posterior analytically.")

Exercise 8 *** — ELBO in a Linear-Gaussian Latent Model

Let z ~ N(0, 1) and x | z ~ N(z, sigma_x^2) with sigma_x = 0.5.

For a variational approximation q(z | x) = N(a x, var_q), define

ELBO(x) = E_q[log p(x | z)] - KL(q(z | x) || p(z))

Task

(a) Implement the ELBO for one scalar x.
(b) Compare the ELBO under the exact posterior parameters and under a naive encoder choice.

Code cell 26

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 27

# Solution
def elbo_linear_gaussian(x, a, var_q, sigma_x):
    mu_q = a * x
    recon = -0.5 * np.log(2 * np.pi * sigma_x**2) - 0.5 * (((x - mu_q) ** 2 + var_q) / sigma_x**2)
    kl = 0.5 * (var_q + mu_q**2 - 1 - np.log(var_q))
    return recon - kl

sigma_x = 0.5
a_exact = 1 / (1 + sigma_x**2)
var_exact = sigma_x**2 / (1 + sigma_x**2)
elbo_exact = elbo_linear_gaussian(1.0, a_exact, var_exact, sigma_x)
elbo_naive = elbo_linear_gaussian(1.0, 1.0, sigma_x**2, sigma_x)

header("Exercise 8: ELBO Comparison")
print(f"ELBO with exact posterior parameters: {elbo_exact:.6f}")
print(f"ELBO with naive encoder parameters  : {elbo_naive:.6f}")
check_true("exact posterior gives larger ELBO", elbo_exact > elbo_naive)
check_true("ELBO values are finite", np.isfinite(elbo_exact) and np.isfinite(elbo_naive))
print("\nTakeaway: Variational inference turns posterior approximation into optimization over a tractable family.")

Exercise 9 *** — Thompson Sampling for a Bernoulli Bandit

Three prompt-selection policies have unknown success rates. Their true rates are:

true_rates = [0.08, 0.11, 0.15]

Task

(a) Implement Thompson sampling with Beta(1, 1) priors.
(b) Run it for 400 rounds.
(c) Verify that the best arm is selected most often.

Code cell 29

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 30

# Solution
def thompson_bandit(true_rates, rounds=400):
    true_rates = np.asarray(true_rates, dtype=float)
    a = np.ones_like(true_rates)
    b = np.ones_like(true_rates)
    pulls = np.zeros_like(true_rates, dtype=int)
    rewards = np.zeros_like(true_rates, dtype=int)
    for _ in range(rounds):
        sampled = np.random.beta(a, b)
        arm = np.argmax(sampled)
        reward = np.random.rand() < true_rates[arm]
        pulls[arm] += 1
        rewards[arm] += int(reward)
        a[arm] += int(reward)
        b[arm] += 1 - int(reward)
    return pulls, rewards, a / (a + b)

pulls, rewards, post_means = thompson_bandit([0.08, 0.11, 0.15], rounds=400)

header("Exercise 9: Thompson Sampling")
print(f"Pull counts    : {pulls}")
print(f"Reward counts  : {rewards}")
print(f"Posterior means: {post_means}")
check_true("best arm is sampled most often", pulls[2] == pulls.max())
check_true("posterior means stay inside [0, 1]", np.all((post_means >= 0) & (post_means <= 1)))
print("\nTakeaway: Thompson sampling turns posterior uncertainty directly into principled exploration.")

Exercise 10 *** - Posterior Predictive Calibration Check

A Beta-Binomial model predicts the number of successes in a future batch.

Part (a): Fit a Beta posterior from observed successes.

Part (b): Simulate the posterior predictive distribution for a future batch size.

Part (c): Compute a 90% posterior predictive interval.

Part (d): Explain why posterior predictive checks are different from parameter credible intervals.

Code cell 32

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 33

# Solution
from scipy import stats

a0, b0 = 2, 2
n_obs, k_obs = 80, 58
a_post, b_post = a0 + k_obs, b0 + n_obs - k_obs
m_future = 40
B = 20000
theta = np.random.beta(a_post, b_post, size=B)
y_future = np.random.binomial(m_future, theta)
ppi = np.percentile(y_future, [5, 95])
mean_pred = y_future.mean()
param_ci = stats.beta(a_post, b_post).ppf([0.05, 0.95])

header('Exercise 10: Posterior Predictive Calibration')
print(f'posterior theta: Beta({a_post}, {b_post})')
print(f'90% credible interval for theta: [{param_ci[0]:.3f}, {param_ci[1]:.3f}]')
print(f'future successes mean: {mean_pred:.2f} out of {m_future}')
print(f'90% posterior predictive interval: [{ppi[0]:.0f}, {ppi[1]:.0f}]')
check_true('Predictive interval has count scale', 0 <= ppi[0] <= ppi[1] <= m_future)
check_true('Predictive distribution includes binomial noise', y_future.var() > m_future * theta.mean() * (1-theta.mean()))
print('Takeaway: parameter uncertainty and future-observation uncertainty are related but not the same object.')

What to Review After Finishing

Checkpoint 1 — Can you derive a conjugate posterior without looking up the formula?
Checkpoint 2 — Can you explain why MAP is not the same as full Bayesian inference?
Checkpoint 3 — Can you state the difference between a Bayes factor and a p-value in one sentence?
Checkpoint 4 — Can you explain why mean-field VI can be overconfident?
Checkpoint 5 — Can you explain why Thompson sampling is a Bayesian decision rule?

References

Gelman et al., Bayesian Data Analysis
Murphy, Machine Learning: A Probabilistic Perspective
Blei, Kucukelbir, McAuliffe, "Variational Inference: A Review for Statisticians"

Posterior predictive calibration - can you connect the computation to statistical ML practice?