Exercises Notebook
Converted from
exercises.ipynbfor web reading.
Scaling Laws: Exercises
Ten exercises cover the arithmetic behind model-size planning: power-law fits, FLOPs, IsoFLOP search, undertraining checks, effective tokens, serving costs, residuals, and forecast checklists.
Code cell 2
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
try:
import seaborn as sns
sns.set_theme(style="whitegrid", palette="colorblind")
HAS_SNS = True
except ImportError:
plt.style.use("seaborn-v0_8-whitegrid")
HAS_SNS = False
mpl.rcParams.update({
"figure.figsize": (10, 6),
"figure.dpi": 120,
"font.size": 13,
"axes.titlesize": 15,
"axes.labelsize": 13,
"xtick.labelsize": 11,
"ytick.labelsize": 11,
"legend.fontsize": 11,
"legend.framealpha": 0.85,
"lines.linewidth": 2.0,
"axes.spines.top": False,
"axes.spines.right": False,
"savefig.bbox": "tight",
"savefig.dpi": 150,
})
np.random.seed(42)
print("Plot setup complete.")
Exercise 1: Fit a power-law exponent
Given X and loss with known floor, fit alpha.
Code cell 4
# Your Solution
X = np.array([1e2, 1e3, 1e4, 1e5])
loss = np.array([2.50, 2.10, 1.88, 1.74])
floor = 1.5
print("Starter: fit log(loss-floor) against log(X).")
Code cell 5
# Solution
X = np.array([1e2, 1e3, 1e4, 1e5])
loss = np.array([2.50, 2.10, 1.88, 1.74])
floor = 1.5
slope, intercept = np.polyfit(np.log(X), np.log(loss - floor), 1)
alpha = -slope
print("alpha:", alpha)
Exercise 2: Training FLOPs
Compute for N=3B and D=100B.
Code cell 7
# Your Solution
N = 3e9
D = 100e9
print("Starter: C = 6 * N * D.")
Code cell 8
# Solution
N = 3e9
D = 100e9
C = 6 * N * D
print("FLOPs:", f"{C:.3e}")
Exercise 3: Tokens from compute
Given C and N, solve D=C/(6N).
Code cell 10
# Your Solution
C = 9e21
N = 1.5e9
print("Starter: divide C by 6N.")
Code cell 11
# Solution
C = 9e21
N = 1.5e9
D = C / (6 * N)
print("tokens:", f"{D:.3e}")
Exercise 4: IsoFLOP search
Find the best N in a toy fixed-compute curve.
Code cell 13
# Your Solution
C = 1e22
N_grid = np.logspace(8, 11, 100)
print("Starter: D=C/(6N), then minimize a toy loss.")
Code cell 14
# Solution
C = 1e22
N_grid = np.logspace(8, 11, 100)
D_grid = C / (6 * N_grid)
loss = 1.6 + (N_grid / 1e9) ** -0.08 + (D_grid / 1e10) ** -0.10
idx = np.argmin(loss)
print("best N:", N_grid[idx])
print("best D:", D_grid[idx])
Exercise 5: Undertraining check
Check whether tokens per parameter is below 20.
Code cell 16
# Your Solution
N = 13e9
D = 100e9
print("Starter: ratio = D / N.")
Code cell 17
# Solution
N = 13e9
D = 100e9
ratio = D / N
print("tokens per parameter:", ratio)
print("under 20:", ratio < 20)
Exercise 6: Effective tokens
Compute quality-weighted token count.
Code cell 19
# Your Solution
tokens = np.array([100, 200, 50])
quality = np.array([1.2, 0.7, 1.5])
print("Starter: dot(tokens, quality).")
Code cell 20
# Solution
tokens = np.array([100, 200, 50])
quality = np.array([1.2, 0.7, 1.5])
effective = np.dot(tokens, quality)
print("effective tokens:", effective)
Exercise 7: Serving cost
Compare two choices with train and serving cost.
Code cell 22
# Your Solution
train = np.array([1.0, 5.0])
serve_per_m = np.array([0.02, 0.10])
queries_m = 100
print("Starter: total = train + queries_m * serve_per_m.")
Code cell 23
# Solution
train = np.array([1.0, 5.0])
serve_per_m = np.array([0.02, 0.10])
queries_m = 100
total = train + queries_m * serve_per_m
print("total costs:", total)
Exercise 8: Residuals
Compute residuals and max absolute residual.
Code cell 25
# Your Solution
actual = np.array([2.0, 1.8, 1.7])
pred = np.array([1.95, 1.82, 1.68])
print("Starter: residual = actual - pred.")
Code cell 26
# Solution
actual = np.array([2.0, 1.8, 1.7])
pred = np.array([1.95, 1.82, 1.68])
residual = actual - pred
print("residual:", residual)
print("max abs:", np.max(np.abs(residual)))
Exercise 9: Threshold artifact
Turn smooth scores into a binary threshold metric.
Code cell 28
# Your Solution
scores = np.array([0.45, 0.49, 0.51, 0.55])
print("Starter: compare scores > 0.5.")
Code cell 29
# Solution
scores = np.array([0.45, 0.49, 0.51, 0.55])
binary = (scores > 0.5).astype(int)
print("binary:", binary)
Exercise 10: Forecast checklist
Write four checks before trusting a scaling forecast.
Code cell 31
# Your Solution
print("Starter: include setup, held-out loss, residuals, and uncertainty.")
Code cell 32
# Solution
checks = [
"setup matches between fit runs and forecast run",
"held-out loss is measured on a fixed evaluation set",
"residuals are checked on withheld runs",
"forecast includes uncertainty and a stop condition",
]
for check in checks:
print("-", check)
Closing Reflection
Scaling laws are useful because they discipline guesses. They are dangerous when their assumptions, residuals, and uncertainty are hidden.