Exercises Notebook
Converted from
exercises.ipynbfor web reading.
Regression Analysis — Exercises
10 exercises covering OLS, projection geometry, intervals, diagnostics, regularization, GLMs, influence, and ML-style linear probes.
| Format | Description |
|---|---|
| Problem | Markdown cell with the exercise statement |
| Your Solution | Runnable scaffold cell |
| Solution | Reference implementation with checks and takeaway |
Difficulty Levels
| Level | Exercises | Focus |
|---|---|---|
| * | 1-3 | Core OLS mechanics and interval reasoning |
| ** | 4-7 | Diagnostics, regularization, logistic and Poisson regression |
| *** | 8-9 | Influence analysis and ML-facing probe intuition |
Topic Map
| Topic | Exercise |
|---|---|
| Covariance-to-slope connection | 1 |
| Matrix OLS and orthogonality | 2 |
| Confidence vs prediction intervals | 3 |
| Multicollinearity and VIF | 4 |
| Ridge vs Lasso | 5 |
| Logistic regression | 6 |
| Poisson / rate modeling | 7 |
| Leverage and Cook's distance | 8 |
| Linear probes and regularization | 9 |
Additional applied exercises: Exercise 9: Calibration curve for logistic scores.
Code cell 2
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
try:
import seaborn as sns
sns.set_theme(style="whitegrid", palette="colorblind")
HAS_SNS = True
except ImportError:
plt.style.use("seaborn-v0_8-whitegrid")
HAS_SNS = False
mpl.rcParams.update({
"figure.figsize": (10, 6),
"figure.dpi": 120,
"font.size": 13,
"axes.titlesize": 15,
"axes.labelsize": 13,
"xtick.labelsize": 11,
"ytick.labelsize": 11,
"legend.fontsize": 11,
"legend.framealpha": 0.85,
"lines.linewidth": 2.0,
"axes.spines.top": False,
"axes.spines.right": False,
"savefig.bbox": "tight",
"savefig.dpi": 150,
})
np.random.seed(42)
print("Plot setup complete.")
Code cell 3
import numpy as np
import numpy.linalg as la
from scipy import stats
from sklearn.linear_model import Ridge, Lasso, LogisticRegression, PoissonRegressor
from sklearn.metrics import accuracy_score, mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
try:
import matplotlib.pyplot as plt
HAS_MPL = True
except ImportError:
HAS_MPL = False
np.set_printoptions(precision=6, suppress=True)
np.random.seed(42)
def header(title):
print("\n" + "=" * len(title))
print(title)
print("=" * len(title))
def check_close(name, got, expected, tol=1e-8):
ok = np.allclose(got, expected, atol=tol, rtol=tol)
print(f"{'PASS' if ok else 'FAIL'} — {name}")
if not ok:
print(" expected:", expected)
print(" got :", got)
return ok
def check_true(name, cond):
print(f"{'PASS' if cond else 'FAIL'} — {name}")
return cond
def add_intercept(X):
X = np.asarray(X, dtype=float)
if X.ndim == 1:
X = X[:, None]
return np.column_stack([np.ones(X.shape[0]), X])
def ols_beta(X, y):
return la.lstsq(np.asarray(X, dtype=float), np.asarray(y, dtype=float), rcond=None)[0]
def rss(y, fitted):
y = np.asarray(y, dtype=float)
fitted = np.asarray(fitted, dtype=float)
return float(np.sum((y - fitted) ** 2))
def vif_scores(X):
X = np.asarray(X, dtype=float)
scores = []
for j in range(X.shape[1]):
target = X[:, j]
others = np.delete(X, j, axis=1)
beta = ols_beta(add_intercept(others), target)
fit = add_intercept(others) @ beta
ss_res = np.sum((target - fit) ** 2)
ss_tot = np.sum((target - target.mean()) ** 2)
r2 = 1 - ss_res / ss_tot
scores.append(1 / (1 - min(r2, 0.999999999)))
return np.array(scores)
def prediction_interval(X, y, x_star, alpha=0.05):
beta = ols_beta(X, y)
fitted = X @ beta
n, p = X.shape
s2 = rss(y, fitted) / (n - p)
xtx_inv = la.pinv(X.T @ X)
x_star = np.asarray(x_star, dtype=float)
mean = float(x_star @ beta)
tcrit = stats.t.ppf(1 - alpha / 2, n - p)
mean_se = np.sqrt(s2 * x_star @ xtx_inv @ x_star)
pred_se = np.sqrt(s2 * (1 + x_star @ xtx_inv @ x_star))
return mean, (mean - tcrit * mean_se, mean + tcrit * mean_se), (mean - tcrit * pred_se, mean + tcrit * pred_se)
def sigmoid(z):
return 1 / (1 + np.exp(-z))
print("Regression helper setup complete.")
Exercise 1 * — From Covariance to the OLS Slope
Let x = [1, 2, 3, 5, 6] and y = [2, 4, 5, 7, 9].
Task
- (a) Compute the simple-regression slope from covariance divided by variance.
- (b) Compute the slope from the OLS matrix formula.
- (c) Verify the two answers agree.
- (d) Compute the intercept.
Code cell 5
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 6
# Solution
def slope_from_covariance(x, y):
x = np.asarray(x, dtype=float)
y = np.asarray(y, dtype=float)
slope = np.cov(x, y, ddof=1)[0, 1] / np.var(x, ddof=1)
intercept = y.mean() - slope * x.mean()
return slope, intercept
x = np.array([1., 2., 3., 5., 6.])
y = np.array([2., 4., 5., 7., 9.])
slope, intercept = slope_from_covariance(x, y)
beta = ols_beta(add_intercept(x), y)
header("Exercise 1: Covariance-to-slope derivation")
print(f"Slope from covariance formula: {slope:.6f}")
print(f"Intercept : {intercept:.6f}")
check_close("slope matches OLS", slope, beta[1])
check_close("intercept matches OLS", intercept, beta[0])
print("\nTakeaway: In simple regression, the OLS slope is exactly sample covariance divided by predictor variance.")
Exercise 2 * — Matrix OLS and Residual Orthogonality
Use the design matrix
and response vector y = [1, 2, 2, 4].
Task
- (a) Compute the OLS coefficients.
- (b) Compute the fitted values and residuals.
- (c) Verify that .
Code cell 8
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 9
# Solution
def solve_ols_system(X, y):
X = np.asarray(X, dtype=float)
y = np.asarray(y, dtype=float)
beta = ols_beta(X, y)
fitted = X @ beta
resid = y - fitted
ortho = X.T @ resid
return beta, fitted, resid, ortho
X = np.array([[1., 0.], [1., 1.], [1., 2.], [1., 3.]])
y = np.array([1., 2., 2., 4.])
beta, fitted, resid, ortho = solve_ols_system(X, y)
header("Exercise 2: Matrix OLS and orthogonality")
print("beta :", beta)
print("fitted :", fitted)
print("resid :", resid)
print("X^T e :", ortho)
check_close("normal equations residual is zero", ortho, np.zeros(2))
check_close("fitted + resid reconstructs y", fitted + resid, y)
print("\nTakeaway: OLS solves a projection problem, so the residual vector is orthogonal to every modeled direction.")
Exercise 3 * — Confidence Interval vs Prediction Interval
Let x = [0, 1, 2, 3, 4] and y = [1.0, 1.8, 3.2, 4.1, 5.1].
Task
- (a) Fit the linear model with intercept.
- (b) Compute the predicted mean at
x*=2.5. - (c) Compute a 95% confidence interval for the mean response.
- (d) Compute a 95% prediction interval for a new observation.
Code cell 11
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 12
# Solution
def regression_intervals(x, y, x_star):
X = add_intercept(x)
mean, ci, pi = prediction_interval(X, np.asarray(y, dtype=float), np.array([1.0, float(x_star)]))
return mean, ci, pi
x = np.array([0., 1., 2., 3., 4.])
y = np.array([1.0, 1.8, 3.2, 4.1, 5.1])
mean, ci, pi = regression_intervals(x, y, 2.5)
header("Exercise 3: Regression intervals")
print(f"Predicted mean at x*=2.5: {mean:.6f}")
print(f"95% CI for mean response: {ci}")
print(f"95% prediction interval : {pi}")
check_true("prediction interval wider than confidence interval", (pi[1] - pi[0]) > (ci[1] - ci[0]))
check_true("predicted mean lies inside both intervals", ci[0] <= mean <= ci[1] and pi[0] <= mean <= pi[1])
print("\nTakeaway: Prediction intervals are wider because they include fresh observation noise in addition to mean-estimation uncertainty.")
Exercise 4 ** — Multicollinearity, Condition Number, and VIF
Build a design with highly correlated predictors:
x1 = [0, 1, 2, 3, 4, 5]x2 = [0.1, 1.1, 2.1, 3.0, 4.2, 5.1]x3 = [2, 1, 0, 1, 2, 3]
Task
- (a) Compute the condition number of the predictor matrix.
- (b) Compute VIF values.
- (c) Decide whether collinearity looks serious.
Code cell 14
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 15
# Solution
def collinearity_report(X):
X = np.asarray(X, dtype=float)
X_std = StandardScaler().fit_transform(X)
condition = la.cond(X_std)
vifs = vif_scores(X_std)
serious = bool(np.max(vifs) > 10 or condition > 20)
return condition, vifs, serious
X = np.array([
[0.0, 0.1, 2.0],
[1.0, 1.1, 1.0],
[2.0, 2.1, 0.0],
[3.0, 3.0, 1.0],
[4.0, 4.2, 2.0],
[5.0, 5.1, 3.0],
])
condition, vifs, serious = collinearity_report(X)
header("Exercise 4: Multicollinearity diagnostics")
print(f"Condition number: {condition:.6f}")
print("VIFs:", np.round(vifs, 6))
print("Serious collinearity:", serious)
check_true("one VIF is large", np.max(vifs) > 10)
check_true("condition number is elevated", condition > 20)
print("\nTakeaway: Collinearity inflates coefficient uncertainty even when the model can still fit the response well.")
Exercise 5 ** — Ridge vs Lasso Under Correlated Predictors
Create a small prediction problem with redundant predictors and compare penalties.
Task
- (a) Fit Ridge and Lasso after standardizing the predictors.
- (b) Compare coefficient norms.
- (c) Count how many exact zeros Lasso produces.
- (d) Compare test MSE.
Code cell 17
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 18
# Solution
def ridge_lasso_compare(X_train, y_train, X_test, y_test):
scaler = StandardScaler().fit(X_train)
Xtr = scaler.transform(X_train)
Xte = scaler.transform(X_test)
ridge = Ridge(alpha=3.0).fit(Xtr, y_train)
lasso = Lasso(alpha=0.15, max_iter=20000).fit(Xtr, y_train)
out = {
"ridge_norm": la.norm(ridge.coef_),
"lasso_norm": la.norm(lasso.coef_),
"lasso_nonzero": int(np.sum(np.abs(lasso.coef_) > 1e-6)),
"ridge_mse": mean_squared_error(y_test, ridge.predict(Xte)),
"lasso_mse": mean_squared_error(y_test, lasso.predict(Xte)),
}
return out
rng = np.random.default_rng(42)
X = rng.normal(size=(120, 6))
X[:, 1] = X[:, 0] + 0.1 * rng.normal(size=120)
true_beta = np.array([2.0, 2.0, 0.0, -1.0, 0.0, 0.0])
y = X @ true_beta + rng.normal(scale=1.0, size=120)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
out = ridge_lasso_compare(X_train, y_train, X_test, y_test)
header("Exercise 5: Ridge vs Lasso")
print(out)
check_true("Lasso yields some sparsity", out["lasso_nonzero"] < 6)
check_true("Coefficient norms are finite", np.isfinite(out["ridge_norm"]) and np.isfinite(out["lasso_norm"]))
print("\nTakeaway: Ridge spreads weight across correlated predictors, while Lasso tends to select a sparse subset.")
Exercise 6 ** — Logistic Regression and Odds Ratios
Use the binary dataset
X = [[-1.0], [-0.5], [0.0], [0.5], [1.0], [1.5]]y = [0, 0, 0, 1, 1, 1]
Task
- (a) Fit logistic regression.
- (b) Report the predicted probability at
x=0.25. - (c) Convert the slope into an odds ratio.
- (d) State whether the predicted class at
x=0.25is 0 or 1 under threshold 0.5.
Code cell 20
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 21
# Solution
def logistic_summary(X, y, x_star):
model = LogisticRegression(C=1e6, solver='lbfgs', max_iter=2000)
model.fit(X, y)
prob = model.predict_proba(x_star)[0, 1]
odds_ratio = float(np.exp(model.coef_[0, 0]))
pred = int(prob >= 0.5)
return model.intercept_[0], model.coef_[0, 0], prob, odds_ratio, pred
X = np.array([[-1.0], [-0.5], [0.0], [0.5], [1.0], [1.5]])
y = np.array([0, 0, 0, 1, 1, 1])
intercept, slope, prob, odds_ratio, pred = logistic_summary(X, y, np.array([[0.25]]))
header("Exercise 6: Logistic regression summary")
print(f"Intercept : {intercept:.6f}")
print(f"Slope : {slope:.6f}")
print(f"Predicted probability: {prob:.6f}")
print(f"Odds ratio : {odds_ratio:.6f}")
print(f"Predicted class : {pred}")
check_true("probability lies in [0, 1]", 0 <= prob <= 1)
check_true("odds ratio is positive", odds_ratio > 0)
print("\nTakeaway: Logistic regression keeps prediction on the probability scale while giving interpretable multiplicative odds changes.")
Exercise 7 ** — Poisson Regression for Rates
Use count data with varying exposure.
Task
- (a) Fit a Poisson model on predictors
[x, log(exposure)]. - (b) Return predicted counts for the first two observations.
- (c) Return the multiplicative rate ratio for a one-unit change in
x. - (d) Verify predictions are positive.
Code cell 23
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 24
# Solution
def poisson_rate_summary(X, y):
model = PoissonRegressor(alpha=1e-8, max_iter=3000)
model.fit(X, y)
preds = model.predict(X[:2])
rate_ratio = float(np.exp(model.coef_[0]))
return model.intercept_, model.coef_, preds, rate_ratio
x = np.array([-1.0, -0.3, 0.2, 0.5, 1.0, 1.4])
exposure = np.array([1.0, 1.5, 1.0, 2.0, 1.2, 1.8])
y = np.array([0, 1, 2, 4, 4, 8])
X = np.column_stack([x, np.log(exposure)])
intercept, coef, preds, rate_ratio = poisson_rate_summary(X, y)
header("Exercise 7: Poisson regression for rates")
print(f"Intercept: {intercept:.6f}")
print("Coefficients:", np.round(coef, 6))
print("First two predicted counts:", np.round(preds, 6))
print(f"Rate ratio for x: {rate_ratio:.6f}")
check_true("predicted counts are positive", np.all(preds > 0))
check_true("rate ratio is positive", rate_ratio > 0)
print("\nTakeaway: The Poisson log link converts additive effects on the linear predictor into multiplicative changes on the rate scale.")
Exercise 8 *** — Leverage, Influence, and Cook's Distance
Consider the regression dataset
x = [0.0, 0.5, 1.0, 1.5, 2.0, 5.0]y = [1.0, 1.4, 2.0, 2.5, 2.9, 7.4]
Task
- (a) Compute the leverage values.
- (b) Compute Cook's distance.
- (c) Identify the most influential observation.
Code cell 26
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 27
# Solution
def influence_summary(x, y):
X = add_intercept(x)
beta = ols_beta(X, y)
fitted = X @ beta
resid = y - fitted
H = X @ la.pinv(X.T @ X) @ X.T
h = np.diag(H)
s2 = rss(y, fitted) / (len(y) - X.shape[1])
cooks = (resid**2 / (X.shape[1] * s2)) * (h / (1 - h) ** 2)
idx = int(np.argmax(cooks))
return h, cooks, idx
x = np.array([0.0, 0.5, 1.0, 1.5, 2.0, 5.0])
y = np.array([1.0, 1.4, 2.0, 2.5, 2.9, 7.4])
h, cooks, idx = influence_summary(x, y)
header("Exercise 8: Influence diagnostics")
print("Leverage:", np.round(h, 6))
print("Cook's distance:", np.round(cooks, 6))
print("Most influential index:", idx)
check_true("last observation has highest leverage", int(np.argmax(h)) == len(x) - 1)
check_true("same observation is most influential", idx == len(x) - 1)
print("\nTakeaway: A point can matter a lot because it sits far out in predictor space, not only because its residual is large.")
Exercise 9 *** — Linear Probe with Regularized Readout
Use frozen embedding-like features and compare weak vs strong regularization.
Task
- (a) Fit logistic probes with weak and strong L2 regularization.
- (b) Report test accuracy for both.
- (c) Report the ratio of coefficient norms.
- (d) Explain which model is more heavily shrunk.
Code cell 29
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 30
# Solution
def probe_compare(emb, labels):
X_train, X_test, y_train, y_test = train_test_split(emb, labels, test_size=0.3, random_state=42)
weak = LogisticRegression(C=10.0, solver='lbfgs', max_iter=2000)
strong = LogisticRegression(C=0.1, solver='lbfgs', max_iter=2000)
weak.fit(X_train, y_train)
strong.fit(X_train, y_train)
weak_acc = accuracy_score(y_test, weak.predict(X_test))
strong_acc = accuracy_score(y_test, strong.predict(X_test))
norm_ratio = la.norm(strong.coef_) / la.norm(weak.coef_)
return weak_acc, strong_acc, norm_ratio
rng = np.random.default_rng(42)
emb = rng.normal(size=(600, 12))
true_w = rng.normal(size=12)
labels = (emb @ true_w + 0.4 * rng.normal(size=600) > 0).astype(int)
weak_acc, strong_acc, norm_ratio = probe_compare(emb, labels)
header("Exercise 9: Linear probe regularization")
print(f"Weak-regularization accuracy : {weak_acc:.6f}")
print(f"Strong-regularization accuracy: {strong_acc:.6f}")
print(f"Strong/weak coefficient-norm ratio: {norm_ratio:.6f}")
check_true("stronger regularization shrinks the probe", norm_ratio < 1.0)
check_true("both probes are useful classifiers", weak_acc > 0.7 and strong_acc > 0.7)
print("\nTakeaway: Linear probes are just regularized readout models on frozen features, so classical shrinkage logic still applies.")
Exercise 10 *** - Calibration Curve for Logistic Regression Scores
A probabilistic classifier should output calibrated probabilities, not only good rankings.
Part (a): Simulate predicted probabilities and binary labels.
Part (b): Bin predictions and compute empirical accuracy per bin.
Part (c): Compute expected calibration error (ECE).
Part (d): Explain why calibration is a regression-style diagnostic for classifiers.
Code cell 32
# Your Solution
print("Write your solution here, then compare with the reference solution below.")
Code cell 33
# Solution
n = 4000
p_pred = np.random.beta(2, 2, size=n)
# Slight miscalibration: true probability is a softened version of p_pred.
p_true = 0.1 + 0.8 * p_pred
y = (np.random.rand(n) < p_true).astype(float)
bins = np.linspace(0, 1, 11)
ids = np.digitize(p_pred, bins) - 1
ece = 0.0
rows = []
for b in range(10):
mask = ids == b
if mask.any():
conf = p_pred[mask].mean()
acc = y[mask].mean()
weight = mask.mean()
ece += weight * abs(acc - conf)
rows.append((b, conf, acc, weight))
header('Exercise 10: Calibration Curve and ECE')
for b, conf, acc, weight in rows:
print(f'bin {b}: confidence={conf:.3f}, accuracy={acc:.3f}, weight={weight:.3f}')
print(f'ECE={ece:.4f}')
check_true('ECE is positive for miscalibrated scores', ece > 0.02)
check_true('Bins cover almost all samples', sum(r[3] for r in rows) > 0.99)
print('Takeaway: a classifier can rank well while still needing a calibrated regression from scores to probabilities.')
What to Review After Finishing
- Re-derive the OLS slope from covariance and variance without looking.
- Explain why is a geometric statement about projection.
- Distinguish confidence intervals for the mean response from prediction intervals for a new observation.
- Remember that collinearity hurts interpretation before it necessarily hurts prediction.
- Keep the geometric difference between Ridge and Lasso in mind when choosing a regularizer.
- Be able to interpret logistic coefficients as odds changes and Poisson coefficients as rate multipliers.
- Recognize that linear probes and weight-decay-style penalties are modern ML reappearances of classical regression ideas.
References
- Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning.
- Gelman, Hill, and Vehtari, Regression and Other Stories.
- Tibshirani (1996), “Regression Shrinkage and Selection via the Lasso.”
- Nelder and Wedderburn (1972), “Generalized Linear Models.”
- Calibration curve for logistic scores - can you connect the computation to statistical ML practice?