Exercises Notebook
Converted from
exercises.ipynbfor web reading.
Bayesian Inference — Exercises
10 exercises covering conjugate Bayes, MAP estimation, posterior prediction, model comparison, sampling, variational ideas, and Bayesian decision-making.
| Format | Description |
|---|---|
| Problem | Markdown cell with task description |
| Your Solution | Code cell with scaffolding |
| Solution | Code cell with reference solution and checks |
Difficulty Levels
| Level | Exercises | Focus |
|---|---|---|
| * | 1-3 | Core posterior mechanics |
| ** | 4-6 | Prediction, model comparison, hierarchical shrinkage |
| *** | 7-10 | Sampling, ELBO reasoning, Bayesian decision-making |
Topic Map
| Topic | Exercise |
|---|---|
| Beta-Binomial updating | 1 |
| Gaussian-Gaussian shrinkage | 2 |
| MAP as regularized MLE | 3 |
| Posterior predictive distributions | 4 |
| Bayes factors | 5 |
| Hierarchical Bayes | 6 |
| MCMC | 7 |
| Variational inference / ELBO | 8 |
| Thompson sampling / Bayesian action | 9 |
Additional applied exercises: Exercise 9: Posterior predictive calibration.
Code cell 2
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
try:
import seaborn as sns
sns.set_theme(style="whitegrid", palette="colorblind")
HAS_SNS = True
except ImportError:
plt.style.use("seaborn-v0_8-whitegrid")
HAS_SNS = False
mpl.rcParams.update({
"figure.figsize": (10, 6),
"figure.dpi": 120,
"font.size": 13,
"axes.titlesize": 15,
"axes.labelsize": 13,
"xtick.labelsize": 11,
"ytick.labelsize": 11,
"legend.fontsize": 11,
"legend.framealpha": 0.85,
"lines.linewidth": 2.0,
"axes.spines.top": False,
"axes.spines.right": False,
"savefig.bbox": "tight",
"savefig.dpi": 150,
})
np.random.seed(42)
print("Plot setup complete.")
Code cell 3
import numpy as np
import numpy.linalg as la
try:
import matplotlib.pyplot as plt
HAS_MPL = True
except ImportError:
HAS_MPL = False
np.set_printoptions(precision=6, suppress=True)
np.random.seed(42)
def header(title):
print("\n" + "=" * len(title))
print(title)
print("=" * len(title))
def check_close(name, got, expected, tol=1e-8):
ok = np.allclose(got, expected, atol=tol, rtol=tol)
print(f"{'PASS' if ok else 'FAIL'} — {name}")
if not ok:
print(" expected:", expected)
print(" got :", got)
return ok
def check_true(name, cond):
print(f"{'PASS' if cond else 'FAIL'} — {name}")
return cond
# Plus any shared utility functions needed across exercises
from scipy import stats, special
def beta_posterior(alpha, beta, k, n):
return alpha + k, beta + n - k
print("Exercise setup complete.")
Exercise 1 * — Beta-Binomial Posterior Update
A binary moderation pipeline flags 17 unsafe outputs out of 40 audited generations.
Task
- (a) Start from a prior
Beta(alpha, beta)withalpha=2,beta=5. - (b) Compute the posterior parameters after seeing the data.
- (c) Return the posterior mean unsafe rate.
- (d) Return a 95% equal-tail credible interval.
Code cell 5
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 6
# Solution
def beta_update(alpha, beta, k, n):
a_post = alpha + k
b_post = beta + n - k
post_mean = a_post / (a_post + b_post)
ci = stats.beta.ppf([0.025, 0.975], a_post, b_post)
return a_post, b_post, post_mean, ci
a_post, b_post, post_mean, ci = beta_update(2, 5, 17, 40)
header("Exercise 1: Beta-Binomial Posterior Update")
print(f"Posterior parameters: Beta({a_post}, {b_post})")
print(f"Posterior mean: {post_mean:.6f}")
print(f"95% credible interval: {ci}")
check_close("posterior alpha", a_post, 19)
check_close("posterior beta", b_post, 28)
check_true("credible interval is inside [0, 1]", 0 <= ci[0] <= ci[1] <= 1)
print("\nTakeaway: Conjugate Bayes updates by adding pseudo-counts to observed counts.")
Exercise 2 * — Gaussian Prior Plus Gaussian Likelihood
You monitor the mean latency drift of a model. The historical belief is mu ~ N(0, 1.5^2). You observe four latency deltas with known observation standard deviation sigma=0.8:
data = [0.4, 0.1, 0.5, 0.2]
Task
- (a) Compute the posterior mean.
- (b) Compute the posterior variance.
- (c) Verify that the posterior mean lies between the prior mean and the sample mean.
Code cell 8
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 9
# Solution
def gaussian_update(mu0, tau0, sigma, data):
data = np.asarray(data, dtype=float)
n = len(data)
xbar = data.mean()
post_var = 1.0 / (1.0 / tau0**2 + n / sigma**2)
post_mean = post_var * (mu0 / tau0**2 + n * xbar / sigma**2)
return post_mean, post_var
data = np.array([0.4, 0.1, 0.5, 0.2])
post_mean, post_var = gaussian_update(0.0, 1.5, 0.8, data)
header("Exercise 2: Gaussian-Gaussian Updating")
print(f"Posterior mean: {post_mean:.6f}")
print(f"Posterior var : {post_var:.6f}")
check_true("posterior mean is between prior mean and sample mean", 0.0 <= post_mean <= data.mean())
check_true("posterior variance is positive", post_var > 0)
print("\nTakeaway: Gaussian Bayesian updating is precision-weighted shrinkage.")
Exercise 3 * — MAP as Ridge Regression
Consider a linear model with Gaussian noise variance sigma2=0.25 and Gaussian prior theta ~ N(0, lambda^{-1} I) where lambda=2.0.
Task
- (a) Compute the MAP estimator.
- (b) Compare its norm to the OLS solution.
- (c) Verify that MAP shrinks toward zero.
Code cell 11
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 12
# Solution
def ridge_map(X, y, lam, sigma2):
return la.solve(X.T @ X / sigma2 + lam * np.eye(X.shape[1]), X.T @ y / sigma2)
X = np.array([[1.0, -1.0], [1.0, 0.0], [1.0, 1.0], [1.0, 2.0]])
y = np.array([0.2, 1.0, 1.8, 2.9])
theta_map = ridge_map(X, y, 2.0, 0.25)
theta_ols = la.lstsq(X, y, rcond=None)[0]
header("Exercise 3: MAP as Ridge")
print(f"OLS estimate : {theta_ols}")
print(f"MAP estimate : {theta_map}")
check_true("MAP has smaller or equal norm than OLS", la.norm(theta_map) <= la.norm(theta_ols) + 1e-12)
check_true("MAP is finite", np.isfinite(theta_map).all())
print("\nTakeaway: A Gaussian prior turns point estimation into regularized MAP optimization.")
Exercise 4 ** — Gamma-Poisson Posterior Predictive
An LLM service receives request counts per minute. Over 6 minutes, the counts are:
counts = [4, 3, 6, 5, 2, 7]
Assume X_i | lambda ~ Poisson(lambda) and lambda ~ Gamma(alpha=2, beta=1) with rate parameterization.
Task
- (a) Compute the posterior parameters.
- (b) Compute the posterior mean of
lambda. - (c) Compute the posterior predictive mean for the next minute.
Code cell 14
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 15
# Solution
def gamma_poisson_update(alpha, beta, counts):
counts = np.asarray(counts, dtype=float)
alpha_post = alpha + counts.sum()
beta_post = beta + len(counts)
post_mean = alpha_post / beta_post
pred_mean = post_mean
return alpha_post, beta_post, post_mean, pred_mean
alpha_post, beta_post, post_mean, pred_mean = gamma_poisson_update(2.0, 1.0, [4, 3, 6, 5, 2, 7])
header("Exercise 4: Gamma-Poisson Posterior Predictive")
print(f"Posterior alpha: {alpha_post}")
print(f"Posterior beta : {beta_post}")
print(f"Posterior mean lambda: {post_mean:.6f}")
print(f"Predictive mean next count: {pred_mean:.6f}")
check_close("posterior alpha", alpha_post, 29.0)
check_close("posterior beta", beta_post, 7.0)
print("\nTakeaway: Posterior prediction averages over rate uncertainty instead of fixing one estimated rate.")
Exercise 5 ** — Bayes Factor vs p-Value
You observe data = [0.10, 0.25, 0.28, 0.34, 0.31, 0.22] from a Gaussian model with known sigma=0.25.
Compare:
H0: mu = 0H1: mu ~ N(0, tau^2)withtau=0.3
Task
- (a) Compute the z-score and two-sided p-value.
- (b) Compute the Bayes factor
BF10. - (c) Explain briefly why the two numbers answer different questions.
Code cell 17
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 18
# Solution
def bayes_factor_normal(data, sigma, tau):
data = np.asarray(data, dtype=float)
n = len(data)
xbar = data.mean()
z = xbar / (sigma / np.sqrt(n))
p_value = 2 * stats.norm.sf(abs(z))
p_h0 = stats.norm.pdf(xbar, loc=0.0, scale=sigma / np.sqrt(n))
p_h1 = stats.norm.pdf(xbar, loc=0.0, scale=np.sqrt(sigma**2 / n + tau**2))
bf10 = p_h1 / p_h0
return z, p_value, bf10
data = np.array([0.10, 0.25, 0.28, 0.34, 0.31, 0.22])
z, p_value, bf10 = bayes_factor_normal(data, 0.25, 0.3)
header("Exercise 5: Bayes Factor vs p-Value")
print(f"z-score : {z:.6f}")
print(f"p-value : {p_value:.6f}")
print(f"BF10 : {bf10:.6f}")
check_true("Bayes factor is positive", bf10 > 0)
check_true("p-value lies in [0, 1]", 0 <= p_value <= 1)
print("\nTakeaway: A p-value measures surprise under H0, while a Bayes factor compares integrated evidence under H1 and H0.")
Exercise 6 ** — Hierarchical Partial Pooling
Four related groups have noisy sample means and sample sizes:
n_g = [5, 10, 40, 100]ybar_g = [1.6, 0.8, 1.1, 1.0]
Assume observation variance sigma2=1.0, population mean mu0=1.0, and population variance tau2=0.25.
Task
- (a) Compute the posterior mean for each group in the normal-normal hierarchical model.
- (b) Verify that smaller groups are shrunk more toward
mu0.
Code cell 20
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 21
# Solution
def partial_pooling_means(ybar_g, n_g, mu0, tau2, sigma2):
ybar_g = np.asarray(ybar_g, dtype=float)
n_g = np.asarray(n_g, dtype=float)
weights = (n_g / sigma2) / (n_g / sigma2 + 1 / tau2)
return weights * ybar_g + (1 - weights) * mu0
ybar_g = np.array([1.6, 0.8, 1.1, 1.0])
n_g = np.array([5, 10, 40, 100])
post_means = partial_pooling_means(ybar_g, n_g, 1.0, 0.25, 1.0)
header("Exercise 6: Hierarchical Partial Pooling")
print(f"Posterior means: {post_means}")
check_true("smallest group moves more than largest group", abs(post_means[0] - ybar_g[0]) > abs(post_means[-1] - ybar_g[-1]))
check_true("largest group stays near its raw mean", abs(post_means[-1] - ybar_g[-1]) < 0.01)
print("\nTakeaway: Partial pooling borrows strength, but it borrows most aggressively where data are sparse.")
Exercise 7 *** — Metropolis-Hastings for a Beta Posterior
Target posterior: theta | D ~ Beta(15, 7).
Task
- (a) Implement a simple random-walk Metropolis-Hastings sampler on
[0, 1]. - (b) Estimate the posterior mean from the samples.
- (c) Compare it to the exact Beta mean.
Code cell 23
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 24
# Solution
def mh_beta(a, b, n_steps=4000, proposal_std=0.08):
chain = np.empty(n_steps)
theta = 0.5
for i in range(n_steps):
proposal = theta + np.random.normal(0, proposal_std)
if 0 < proposal < 1:
log_acc = stats.beta.logpdf(proposal, a, b) - stats.beta.logpdf(theta, a, b)
if np.log(np.random.rand()) < log_acc:
theta = proposal
chain[i] = theta
return chain
chain = mh_beta(15, 7)
burn = 500
sample_mean = chain[burn:].mean()
exact_mean = 15 / (15 + 7)
header("Exercise 7: Metropolis-Hastings")
print(f"Sample mean: {sample_mean:.6f}")
print(f"Exact mean : {exact_mean:.6f}")
check_true("chain stays inside [0, 1]", np.all((chain >= 0) & (chain <= 1)))
check_true("sample mean is close to exact mean", abs(sample_mean - exact_mean) < 0.03)
print("\nTakeaway: MCMC approximates posterior expectations by sampling from an invariant chain instead of solving the posterior analytically.")
Exercise 8 *** — ELBO in a Linear-Gaussian Latent Model
Let z ~ N(0, 1) and x | z ~ N(z, sigma_x^2) with sigma_x = 0.5.
For a variational approximation q(z | x) = N(a x, var_q), define
ELBO(x) = E_q[log p(x | z)] - KL(q(z | x) || p(z))
Task
- (a) Implement the ELBO for one scalar
x. - (b) Compare the ELBO under the exact posterior parameters and under a naive encoder choice.
Code cell 26
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 27
# Solution
def elbo_linear_gaussian(x, a, var_q, sigma_x):
mu_q = a * x
recon = -0.5 * np.log(2 * np.pi * sigma_x**2) - 0.5 * (((x - mu_q) ** 2 + var_q) / sigma_x**2)
kl = 0.5 * (var_q + mu_q**2 - 1 - np.log(var_q))
return recon - kl
sigma_x = 0.5
a_exact = 1 / (1 + sigma_x**2)
var_exact = sigma_x**2 / (1 + sigma_x**2)
elbo_exact = elbo_linear_gaussian(1.0, a_exact, var_exact, sigma_x)
elbo_naive = elbo_linear_gaussian(1.0, 1.0, sigma_x**2, sigma_x)
header("Exercise 8: ELBO Comparison")
print(f"ELBO with exact posterior parameters: {elbo_exact:.6f}")
print(f"ELBO with naive encoder parameters : {elbo_naive:.6f}")
check_true("exact posterior gives larger ELBO", elbo_exact > elbo_naive)
check_true("ELBO values are finite", np.isfinite(elbo_exact) and np.isfinite(elbo_naive))
print("\nTakeaway: Variational inference turns posterior approximation into optimization over a tractable family.")
Exercise 9 *** — Thompson Sampling for a Bernoulli Bandit
Three prompt-selection policies have unknown success rates. Their true rates are:
true_rates = [0.08, 0.11, 0.15]
Task
- (a) Implement Thompson sampling with Beta(1, 1) priors.
- (b) Run it for 400 rounds.
- (c) Verify that the best arm is selected most often.
Code cell 29
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 30
# Solution
def thompson_bandit(true_rates, rounds=400):
true_rates = np.asarray(true_rates, dtype=float)
a = np.ones_like(true_rates)
b = np.ones_like(true_rates)
pulls = np.zeros_like(true_rates, dtype=int)
rewards = np.zeros_like(true_rates, dtype=int)
for _ in range(rounds):
sampled = np.random.beta(a, b)
arm = np.argmax(sampled)
reward = np.random.rand() < true_rates[arm]
pulls[arm] += 1
rewards[arm] += int(reward)
a[arm] += int(reward)
b[arm] += 1 - int(reward)
return pulls, rewards, a / (a + b)
pulls, rewards, post_means = thompson_bandit([0.08, 0.11, 0.15], rounds=400)
header("Exercise 9: Thompson Sampling")
print(f"Pull counts : {pulls}")
print(f"Reward counts : {rewards}")
print(f"Posterior means: {post_means}")
check_true("best arm is sampled most often", pulls[2] == pulls.max())
check_true("posterior means stay inside [0, 1]", np.all((post_means >= 0) & (post_means <= 1)))
print("\nTakeaway: Thompson sampling turns posterior uncertainty directly into principled exploration.")
Exercise 10 *** - Posterior Predictive Calibration Check
A Beta-Binomial model predicts the number of successes in a future batch.
Part (a): Fit a Beta posterior from observed successes.
Part (b): Simulate the posterior predictive distribution for a future batch size.
Part (c): Compute a 90% posterior predictive interval.
Part (d): Explain why posterior predictive checks are different from parameter credible intervals.
Code cell 32
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 33
# Solution
from scipy import stats
a0, b0 = 2, 2
n_obs, k_obs = 80, 58
a_post, b_post = a0 + k_obs, b0 + n_obs - k_obs
m_future = 40
B = 20000
theta = np.random.beta(a_post, b_post, size=B)
y_future = np.random.binomial(m_future, theta)
ppi = np.percentile(y_future, [5, 95])
mean_pred = y_future.mean()
param_ci = stats.beta(a_post, b_post).ppf([0.05, 0.95])
header('Exercise 10: Posterior Predictive Calibration')
print(f'posterior theta: Beta({a_post}, {b_post})')
print(f'90% credible interval for theta: [{param_ci[0]:.3f}, {param_ci[1]:.3f}]')
print(f'future successes mean: {mean_pred:.2f} out of {m_future}')
print(f'90% posterior predictive interval: [{ppi[0]:.0f}, {ppi[1]:.0f}]')
check_true('Predictive interval has count scale', 0 <= ppi[0] <= ppi[1] <= m_future)
check_true('Predictive distribution includes binomial noise', y_future.var() > m_future * theta.mean() * (1-theta.mean()))
print('Takeaway: parameter uncertainty and future-observation uncertainty are related but not the same object.')
What to Review After Finishing
- Checkpoint 1 — Can you derive a conjugate posterior without looking up the formula?
- Checkpoint 2 — Can you explain why MAP is not the same as full Bayesian inference?
- Checkpoint 3 — Can you state the difference between a Bayes factor and a p-value in one sentence?
- Checkpoint 4 — Can you explain why mean-field VI can be overconfident?
- Checkpoint 5 — Can you explain why Thompson sampling is a Bayesian decision rule?
References
- Gelman et al., Bayesian Data Analysis
- Murphy, Machine Learning: A Probabilistic Perspective
- Blei, Kucukelbir, McAuliffe, "Variational Inference: A Review for Statisticians"
- Posterior predictive calibration - can you connect the computation to statistical ML practice?