Exercises Notebook

Converted from exercises.ipynb for web reading.

Regression Analysis — Exercises

10 exercises covering OLS, projection geometry, intervals, diagnostics, regularization, GLMs, influence, and ML-style linear probes.

Format	Description
Problem	Markdown cell with the exercise statement
Your Solution	Runnable scaffold cell
Solution	Reference implementation with checks and takeaway

Difficulty Levels

Level	Exercises	Focus
*	1-3	Core OLS mechanics and interval reasoning
**	4-7	Diagnostics, regularization, logistic and Poisson regression
***	8-9	Influence analysis and ML-facing probe intuition

Topic Map

Topic	Exercise
Covariance-to-slope connection	1
Matrix OLS and orthogonality	2
Confidence vs prediction intervals	3
Multicollinearity and VIF	4
Ridge vs Lasso	5
Logistic regression	6
Poisson / rate modeling	7
Leverage and Cook's distance	8
Linear probes and regularization	9

Additional applied exercises: Exercise 9: Calibration curve for logistic scores.

Code cell 2

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

try:
    import seaborn as sns
    sns.set_theme(style="whitegrid", palette="colorblind")
    HAS_SNS = True
except ImportError:
    plt.style.use("seaborn-v0_8-whitegrid")
    HAS_SNS = False

mpl.rcParams.update({
    "figure.figsize":    (10, 6),
    "figure.dpi":         120,
    "font.size":           13,
    "axes.titlesize":      15,
    "axes.labelsize":      13,
    "xtick.labelsize":     11,
    "ytick.labelsize":     11,
    "legend.fontsize":     11,
    "legend.framealpha":   0.85,
    "lines.linewidth":      2.0,
    "axes.spines.top":     False,
    "axes.spines.right":   False,
    "savefig.bbox":       "tight",
    "savefig.dpi":         150,
})
np.random.seed(42)
print("Plot setup complete.")

Code cell 3

import numpy as np
import numpy.linalg as la
from scipy import stats
from sklearn.linear_model import Ridge, Lasso, LogisticRegression, PoissonRegressor
from sklearn.metrics import accuracy_score, mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

try:
    import matplotlib.pyplot as plt
    HAS_MPL = True
except ImportError:
    HAS_MPL = False

np.set_printoptions(precision=6, suppress=True)
np.random.seed(42)


def header(title):
    print("\n" + "=" * len(title))
    print(title)
    print("=" * len(title))


def check_close(name, got, expected, tol=1e-8):
    ok = np.allclose(got, expected, atol=tol, rtol=tol)
    print(f"{'PASS' if ok else 'FAIL'} — {name}")
    if not ok:
        print("  expected:", expected)
        print("  got     :", got)
    return ok


def check_true(name, cond):
    print(f"{'PASS' if cond else 'FAIL'} — {name}")
    return cond


def add_intercept(X):
    X = np.asarray(X, dtype=float)
    if X.ndim == 1:
        X = X[:, None]
    return np.column_stack([np.ones(X.shape[0]), X])


def ols_beta(X, y):
    return la.lstsq(np.asarray(X, dtype=float), np.asarray(y, dtype=float), rcond=None)[0]


def rss(y, fitted):
    y = np.asarray(y, dtype=float)
    fitted = np.asarray(fitted, dtype=float)
    return float(np.sum((y - fitted) ** 2))


def vif_scores(X):
    X = np.asarray(X, dtype=float)
    scores = []
    for j in range(X.shape[1]):
        target = X[:, j]
        others = np.delete(X, j, axis=1)
        beta = ols_beta(add_intercept(others), target)
        fit = add_intercept(others) @ beta
        ss_res = np.sum((target - fit) ** 2)
        ss_tot = np.sum((target - target.mean()) ** 2)
        r2 = 1 - ss_res / ss_tot
        scores.append(1 / (1 - min(r2, 0.999999999)))
    return np.array(scores)


def prediction_interval(X, y, x_star, alpha=0.05):
    beta = ols_beta(X, y)
    fitted = X @ beta
    n, p = X.shape
    s2 = rss(y, fitted) / (n - p)
    xtx_inv = la.pinv(X.T @ X)
    x_star = np.asarray(x_star, dtype=float)
    mean = float(x_star @ beta)
    tcrit = stats.t.ppf(1 - alpha / 2, n - p)
    mean_se = np.sqrt(s2 * x_star @ xtx_inv @ x_star)
    pred_se = np.sqrt(s2 * (1 + x_star @ xtx_inv @ x_star))
    return mean, (mean - tcrit * mean_se, mean + tcrit * mean_se), (mean - tcrit * pred_se, mean + tcrit * pred_se)


def sigmoid(z):
    return 1 / (1 + np.exp(-z))

print("Regression helper setup complete.")

Exercise 1 * — From Covariance to the OLS Slope

Let x = [1, 2, 3, 5, 6] and y = [2, 4, 5, 7, 9].

Task

(a) Compute the simple-regression slope from covariance divided by variance.
(b) Compute the slope from the OLS matrix formula.
(c) Verify the two answers agree.
(d) Compute the intercept.

Code cell 5

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 6

# Solution
def slope_from_covariance(x, y):
    x = np.asarray(x, dtype=float)
    y = np.asarray(y, dtype=float)
    slope = np.cov(x, y, ddof=1)[0, 1] / np.var(x, ddof=1)
    intercept = y.mean() - slope * x.mean()
    return slope, intercept

x = np.array([1., 2., 3., 5., 6.])
y = np.array([2., 4., 5., 7., 9.])
slope, intercept = slope_from_covariance(x, y)
beta = ols_beta(add_intercept(x), y)

header("Exercise 1: Covariance-to-slope derivation")
print(f"Slope from covariance formula: {slope:.6f}")
print(f"Intercept                    : {intercept:.6f}")
check_close("slope matches OLS", slope, beta[1])
check_close("intercept matches OLS", intercept, beta[0])
print("\nTakeaway: In simple regression, the OLS slope is exactly sample covariance divided by predictor variance.")

Exercise 2 * — Matrix OLS and Residual Orthogonality

Use the design matrix

X = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{bmatrix}

and response vector y = [1, 2, 2, 4].

Task

(a) Compute the OLS coefficients.
(b) Compute the fitted values and residuals.
(c) Verify that $X^T e = 0$ .

Code cell 8

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 9

# Solution
def solve_ols_system(X, y):
    X = np.asarray(X, dtype=float)
    y = np.asarray(y, dtype=float)
    beta = ols_beta(X, y)
    fitted = X @ beta
    resid = y - fitted
    ortho = X.T @ resid
    return beta, fitted, resid, ortho

X = np.array([[1., 0.], [1., 1.], [1., 2.], [1., 3.]])
y = np.array([1., 2., 2., 4.])
beta, fitted, resid, ortho = solve_ols_system(X, y)

header("Exercise 2: Matrix OLS and orthogonality")
print("beta   :", beta)
print("fitted :", fitted)
print("resid  :", resid)
print("X^T e  :", ortho)
check_close("normal equations residual is zero", ortho, np.zeros(2))
check_close("fitted + resid reconstructs y", fitted + resid, y)
print("\nTakeaway: OLS solves a projection problem, so the residual vector is orthogonal to every modeled direction.")

Exercise 3 * — Confidence Interval vs Prediction Interval

Let x = [0, 1, 2, 3, 4] and y = [1.0, 1.8, 3.2, 4.1, 5.1].

Task

(a) Fit the linear model with intercept.
(b) Compute the predicted mean at x*=2.5.
(c) Compute a 95% confidence interval for the mean response.
(d) Compute a 95% prediction interval for a new observation.

Code cell 11

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 12

# Solution
def regression_intervals(x, y, x_star):
    X = add_intercept(x)
    mean, ci, pi = prediction_interval(X, np.asarray(y, dtype=float), np.array([1.0, float(x_star)]))
    return mean, ci, pi

x = np.array([0., 1., 2., 3., 4.])
y = np.array([1.0, 1.8, 3.2, 4.1, 5.1])
mean, ci, pi = regression_intervals(x, y, 2.5)

header("Exercise 3: Regression intervals")
print(f"Predicted mean at x*=2.5: {mean:.6f}")
print(f"95% CI for mean response: {ci}")
print(f"95% prediction interval : {pi}")
check_true("prediction interval wider than confidence interval", (pi[1] - pi[0]) > (ci[1] - ci[0]))
check_true("predicted mean lies inside both intervals", ci[0] <= mean <= ci[1] and pi[0] <= mean <= pi[1])
print("\nTakeaway: Prediction intervals are wider because they include fresh observation noise in addition to mean-estimation uncertainty.")

Exercise 4 ** — Multicollinearity, Condition Number, and VIF

Build a design with highly correlated predictors:

x1 = [0, 1, 2, 3, 4, 5]
x2 = [0.1, 1.1, 2.1, 3.0, 4.2, 5.1]
x3 = [2, 1, 0, 1, 2, 3]

Task

(a) Compute the condition number of the predictor matrix.
(b) Compute VIF values.
(c) Decide whether collinearity looks serious.

Code cell 14

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 15

# Solution
def collinearity_report(X):
    X = np.asarray(X, dtype=float)
    X_std = StandardScaler().fit_transform(X)
    condition = la.cond(X_std)
    vifs = vif_scores(X_std)
    serious = bool(np.max(vifs) > 10 or condition > 20)
    return condition, vifs, serious

X = np.array([
    [0.0, 0.1, 2.0],
    [1.0, 1.1, 1.0],
    [2.0, 2.1, 0.0],
    [3.0, 3.0, 1.0],
    [4.0, 4.2, 2.0],
    [5.0, 5.1, 3.0],
])
condition, vifs, serious = collinearity_report(X)

header("Exercise 4: Multicollinearity diagnostics")
print(f"Condition number: {condition:.6f}")
print("VIFs:", np.round(vifs, 6))
print("Serious collinearity:", serious)
check_true("one VIF is large", np.max(vifs) > 10)
check_true("condition number is elevated", condition > 20)
print("\nTakeaway: Collinearity inflates coefficient uncertainty even when the model can still fit the response well.")

Exercise 5 ** — Ridge vs Lasso Under Correlated Predictors

Create a small prediction problem with redundant predictors and compare penalties.

Task

(a) Fit Ridge and Lasso after standardizing the predictors.
(b) Compare coefficient norms.
(c) Count how many exact zeros Lasso produces.
(d) Compare test MSE.

Code cell 17

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 18

# Solution
def ridge_lasso_compare(X_train, y_train, X_test, y_test):
    scaler = StandardScaler().fit(X_train)
    Xtr = scaler.transform(X_train)
    Xte = scaler.transform(X_test)
    ridge = Ridge(alpha=3.0).fit(Xtr, y_train)
    lasso = Lasso(alpha=0.15, max_iter=20000).fit(Xtr, y_train)
    out = {
        "ridge_norm": la.norm(ridge.coef_),
        "lasso_norm": la.norm(lasso.coef_),
        "lasso_nonzero": int(np.sum(np.abs(lasso.coef_) > 1e-6)),
        "ridge_mse": mean_squared_error(y_test, ridge.predict(Xte)),
        "lasso_mse": mean_squared_error(y_test, lasso.predict(Xte)),
    }
    return out

rng = np.random.default_rng(42)
X = rng.normal(size=(120, 6))
X[:, 1] = X[:, 0] + 0.1 * rng.normal(size=120)
true_beta = np.array([2.0, 2.0, 0.0, -1.0, 0.0, 0.0])
y = X @ true_beta + rng.normal(scale=1.0, size=120)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
out = ridge_lasso_compare(X_train, y_train, X_test, y_test)

header("Exercise 5: Ridge vs Lasso")
print(out)
check_true("Lasso yields some sparsity", out["lasso_nonzero"] < 6)
check_true("Coefficient norms are finite", np.isfinite(out["ridge_norm"]) and np.isfinite(out["lasso_norm"]))
print("\nTakeaway: Ridge spreads weight across correlated predictors, while Lasso tends to select a sparse subset.")

Exercise 6 ** — Logistic Regression and Odds Ratios

Use the binary dataset

X = [[-1.0], [-0.5], [0.0], [0.5], [1.0], [1.5]]
y = [0, 0, 0, 1, 1, 1]

Task

(a) Fit logistic regression.
(b) Report the predicted probability at x=0.25.
(c) Convert the slope into an odds ratio.
(d) State whether the predicted class at x=0.25 is 0 or 1 under threshold 0.5.

Code cell 20

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 21

# Solution
def logistic_summary(X, y, x_star):
    model = LogisticRegression(C=1e6, solver='lbfgs', max_iter=2000)
    model.fit(X, y)
    prob = model.predict_proba(x_star)[0, 1]
    odds_ratio = float(np.exp(model.coef_[0, 0]))
    pred = int(prob >= 0.5)
    return model.intercept_[0], model.coef_[0, 0], prob, odds_ratio, pred

X = np.array([[-1.0], [-0.5], [0.0], [0.5], [1.0], [1.5]])
y = np.array([0, 0, 0, 1, 1, 1])
intercept, slope, prob, odds_ratio, pred = logistic_summary(X, y, np.array([[0.25]]))

header("Exercise 6: Logistic regression summary")
print(f"Intercept            : {intercept:.6f}")
print(f"Slope                : {slope:.6f}")
print(f"Predicted probability: {prob:.6f}")
print(f"Odds ratio           : {odds_ratio:.6f}")
print(f"Predicted class      : {pred}")
check_true("probability lies in [0, 1]", 0 <= prob <= 1)
check_true("odds ratio is positive", odds_ratio > 0)
print("\nTakeaway: Logistic regression keeps prediction on the probability scale while giving interpretable multiplicative odds changes.")

Exercise 7 ** — Poisson Regression for Rates

Use count data with varying exposure.

Task

(a) Fit a Poisson model on predictors [x, log(exposure)].
(b) Return predicted counts for the first two observations.
(c) Return the multiplicative rate ratio for a one-unit change in x.
(d) Verify predictions are positive.

Code cell 23

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 24

# Solution
def poisson_rate_summary(X, y):
    model = PoissonRegressor(alpha=1e-8, max_iter=3000)
    model.fit(X, y)
    preds = model.predict(X[:2])
    rate_ratio = float(np.exp(model.coef_[0]))
    return model.intercept_, model.coef_, preds, rate_ratio

x = np.array([-1.0, -0.3, 0.2, 0.5, 1.0, 1.4])
exposure = np.array([1.0, 1.5, 1.0, 2.0, 1.2, 1.8])
y = np.array([0, 1, 2, 4, 4, 8])
X = np.column_stack([x, np.log(exposure)])
intercept, coef, preds, rate_ratio = poisson_rate_summary(X, y)

header("Exercise 7: Poisson regression for rates")
print(f"Intercept: {intercept:.6f}")
print("Coefficients:", np.round(coef, 6))
print("First two predicted counts:", np.round(preds, 6))
print(f"Rate ratio for x: {rate_ratio:.6f}")
check_true("predicted counts are positive", np.all(preds > 0))
check_true("rate ratio is positive", rate_ratio > 0)
print("\nTakeaway: The Poisson log link converts additive effects on the linear predictor into multiplicative changes on the rate scale.")

Exercise 8 *** — Leverage, Influence, and Cook's Distance

Consider the regression dataset

x = [0.0, 0.5, 1.0, 1.5, 2.0, 5.0]
y = [1.0, 1.4, 2.0, 2.5, 2.9, 7.4]

Task

(a) Compute the leverage values.
(b) Compute Cook's distance.
(c) Identify the most influential observation.

Code cell 26

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 27

# Solution
def influence_summary(x, y):
    X = add_intercept(x)
    beta = ols_beta(X, y)
    fitted = X @ beta
    resid = y - fitted
    H = X @ la.pinv(X.T @ X) @ X.T
    h = np.diag(H)
    s2 = rss(y, fitted) / (len(y) - X.shape[1])
    cooks = (resid**2 / (X.shape[1] * s2)) * (h / (1 - h) ** 2)
    idx = int(np.argmax(cooks))
    return h, cooks, idx

x = np.array([0.0, 0.5, 1.0, 1.5, 2.0, 5.0])
y = np.array([1.0, 1.4, 2.0, 2.5, 2.9, 7.4])
h, cooks, idx = influence_summary(x, y)

header("Exercise 8: Influence diagnostics")
print("Leverage:", np.round(h, 6))
print("Cook's distance:", np.round(cooks, 6))
print("Most influential index:", idx)
check_true("last observation has highest leverage", int(np.argmax(h)) == len(x) - 1)
check_true("same observation is most influential", idx == len(x) - 1)
print("\nTakeaway: A point can matter a lot because it sits far out in predictor space, not only because its residual is large.")

Exercise 9 *** — Linear Probe with Regularized Readout

Use frozen embedding-like features and compare weak vs strong regularization.

Task

(a) Fit logistic probes with weak and strong L2 regularization.
(b) Report test accuracy for both.
(c) Report the ratio of coefficient norms.
(d) Explain which model is more heavily shrunk.

Code cell 29

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 30

# Solution
def probe_compare(emb, labels):
    X_train, X_test, y_train, y_test = train_test_split(emb, labels, test_size=0.3, random_state=42)
    weak = LogisticRegression(C=10.0, solver='lbfgs', max_iter=2000)
    strong = LogisticRegression(C=0.1, solver='lbfgs', max_iter=2000)
    weak.fit(X_train, y_train)
    strong.fit(X_train, y_train)
    weak_acc = accuracy_score(y_test, weak.predict(X_test))
    strong_acc = accuracy_score(y_test, strong.predict(X_test))
    norm_ratio = la.norm(strong.coef_) / la.norm(weak.coef_)
    return weak_acc, strong_acc, norm_ratio

rng = np.random.default_rng(42)
emb = rng.normal(size=(600, 12))
true_w = rng.normal(size=12)
labels = (emb @ true_w + 0.4 * rng.normal(size=600) > 0).astype(int)
weak_acc, strong_acc, norm_ratio = probe_compare(emb, labels)

header("Exercise 9: Linear probe regularization")
print(f"Weak-regularization accuracy : {weak_acc:.6f}")
print(f"Strong-regularization accuracy: {strong_acc:.6f}")
print(f"Strong/weak coefficient-norm ratio: {norm_ratio:.6f}")
check_true("stronger regularization shrinks the probe", norm_ratio < 1.0)
check_true("both probes are useful classifiers", weak_acc > 0.7 and strong_acc > 0.7)
print("\nTakeaway: Linear probes are just regularized readout models on frozen features, so classical shrinkage logic still applies.")

Exercise 10 *** - Calibration Curve for Logistic Regression Scores

A probabilistic classifier should output calibrated probabilities, not only good rankings.

Part (a): Simulate predicted probabilities and binary labels.

Part (b): Bin predictions and compute empirical accuracy per bin.

Part (c): Compute expected calibration error (ECE).

Part (d): Explain why calibration is a regression-style diagnostic for classifiers.

Code cell 32

# Your Solution
print("Write your solution here, then compare with the reference solution below.")

Code cell 33

# Solution
n = 4000
p_pred = np.random.beta(2, 2, size=n)
# Slight miscalibration: true probability is a softened version of p_pred.
p_true = 0.1 + 0.8 * p_pred
y = (np.random.rand(n) < p_true).astype(float)

bins = np.linspace(0, 1, 11)
ids = np.digitize(p_pred, bins) - 1
ece = 0.0
rows = []
for b in range(10):
    mask = ids == b
    if mask.any():
        conf = p_pred[mask].mean()
        acc = y[mask].mean()
        weight = mask.mean()
        ece += weight * abs(acc - conf)
        rows.append((b, conf, acc, weight))

header('Exercise 10: Calibration Curve and ECE')
for b, conf, acc, weight in rows:
    print(f'bin {b}: confidence={conf:.3f}, accuracy={acc:.3f}, weight={weight:.3f}')
print(f'ECE={ece:.4f}')
check_true('ECE is positive for miscalibrated scores', ece > 0.02)
check_true('Bins cover almost all samples', sum(r[3] for r in rows) > 0.99)
print('Takeaway: a classifier can rank well while still needing a calibrated regression from scores to probabilities.')

What to Review After Finishing

Re-derive the OLS slope from covariance and variance without looking.
Explain why $X^T e = 0$ is a geometric statement about projection.
Distinguish confidence intervals for the mean response from prediction intervals for a new observation.
Remember that collinearity hurts interpretation before it necessarily hurts prediction.
Keep the geometric difference between Ridge and Lasso in mind when choosing a regularizer.
Be able to interpret logistic coefficients as odds changes and Poisson coefficients as rate multipliers.
Recognize that linear probes and weight-decay-style penalties are modern ML reappearances of classical regression ideas.

References

Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning.
Gelman, Hill, and Vehtari, Regression and Other Stories.
Tibshirani (1996), “Regression Shrinkage and Selection via the Lasso.”
Nelder and Wedderburn (1972), “Generalized Linear Models.”

Calibration curve for logistic scores - can you connect the computation to statistical ML practice?