Exercises NotebookMath for LLMs

Scaling Laws

Math for LLMs / Scaling Laws

Run notebook
Exercises Notebook

Exercises Notebook

Converted from exercises.ipynb for web reading.

Scaling Laws: Exercises

Ten exercises cover the arithmetic behind model-size planning: power-law fits, FLOPs, IsoFLOP search, undertraining checks, effective tokens, serving costs, residuals, and forecast checklists.

Code cell 2

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

try:
    import seaborn as sns
    sns.set_theme(style="whitegrid", palette="colorblind")
    HAS_SNS = True
except ImportError:
    plt.style.use("seaborn-v0_8-whitegrid")
    HAS_SNS = False

mpl.rcParams.update({
    "figure.figsize":    (10, 6),
    "figure.dpi":         120,
    "font.size":           13,
    "axes.titlesize":      15,
    "axes.labelsize":      13,
    "xtick.labelsize":     11,
    "ytick.labelsize":     11,
    "legend.fontsize":     11,
    "legend.framealpha":   0.85,
    "lines.linewidth":      2.0,
    "axes.spines.top":     False,
    "axes.spines.right":   False,
    "savefig.bbox":       "tight",
    "savefig.dpi":         150,
})
np.random.seed(42)
print("Plot setup complete.")

Exercise 1: Fit a power-law exponent

Given X and loss with known floor, fit alpha.

Code cell 4

# Your Solution
X = np.array([1e2, 1e3, 1e4, 1e5])
loss = np.array([2.50, 2.10, 1.88, 1.74])
floor = 1.5
print("Starter: fit log(loss-floor) against log(X).")

Code cell 5

# Solution
X = np.array([1e2, 1e3, 1e4, 1e5])
loss = np.array([2.50, 2.10, 1.88, 1.74])
floor = 1.5
slope, intercept = np.polyfit(np.log(X), np.log(loss - floor), 1)
alpha = -slope
print("alpha:", alpha)

Exercise 2: Training FLOPs

Compute C=6NDC=6ND for N=3B and D=100B.

Code cell 7

# Your Solution
N = 3e9
D = 100e9
print("Starter: C = 6 * N * D.")

Code cell 8

# Solution
N = 3e9
D = 100e9
C = 6 * N * D
print("FLOPs:", f"{C:.3e}")

Exercise 3: Tokens from compute

Given C and N, solve D=C/(6N).

Code cell 10

# Your Solution
C = 9e21
N = 1.5e9
print("Starter: divide C by 6N.")

Code cell 11

# Solution
C = 9e21
N = 1.5e9
D = C / (6 * N)
print("tokens:", f"{D:.3e}")

Exercise 4: IsoFLOP search

Find the best N in a toy fixed-compute curve.

Code cell 13

# Your Solution
C = 1e22
N_grid = np.logspace(8, 11, 100)
print("Starter: D=C/(6N), then minimize a toy loss.")

Code cell 14

# Solution
C = 1e22
N_grid = np.logspace(8, 11, 100)
D_grid = C / (6 * N_grid)
loss = 1.6 + (N_grid / 1e9) ** -0.08 + (D_grid / 1e10) ** -0.10
idx = np.argmin(loss)
print("best N:", N_grid[idx])
print("best D:", D_grid[idx])

Exercise 5: Undertraining check

Check whether tokens per parameter is below 20.

Code cell 16

# Your Solution
N = 13e9
D = 100e9
print("Starter: ratio = D / N.")

Code cell 17

# Solution
N = 13e9
D = 100e9
ratio = D / N
print("tokens per parameter:", ratio)
print("under 20:", ratio < 20)

Exercise 6: Effective tokens

Compute quality-weighted token count.

Code cell 19

# Your Solution
tokens = np.array([100, 200, 50])
quality = np.array([1.2, 0.7, 1.5])
print("Starter: dot(tokens, quality).")

Code cell 20

# Solution
tokens = np.array([100, 200, 50])
quality = np.array([1.2, 0.7, 1.5])
effective = np.dot(tokens, quality)
print("effective tokens:", effective)

Exercise 7: Serving cost

Compare two choices with train and serving cost.

Code cell 22

# Your Solution
train = np.array([1.0, 5.0])
serve_per_m = np.array([0.02, 0.10])
queries_m = 100
print("Starter: total = train + queries_m * serve_per_m.")

Code cell 23

# Solution
train = np.array([1.0, 5.0])
serve_per_m = np.array([0.02, 0.10])
queries_m = 100
total = train + queries_m * serve_per_m
print("total costs:", total)

Exercise 8: Residuals

Compute residuals and max absolute residual.

Code cell 25

# Your Solution
actual = np.array([2.0, 1.8, 1.7])
pred = np.array([1.95, 1.82, 1.68])
print("Starter: residual = actual - pred.")

Code cell 26

# Solution
actual = np.array([2.0, 1.8, 1.7])
pred = np.array([1.95, 1.82, 1.68])
residual = actual - pred
print("residual:", residual)
print("max abs:", np.max(np.abs(residual)))

Exercise 9: Threshold artifact

Turn smooth scores into a binary threshold metric.

Code cell 28

# Your Solution
scores = np.array([0.45, 0.49, 0.51, 0.55])
print("Starter: compare scores > 0.5.")

Code cell 29

# Solution
scores = np.array([0.45, 0.49, 0.51, 0.55])
binary = (scores > 0.5).astype(int)
print("binary:", binary)

Exercise 10: Forecast checklist

Write four checks before trusting a scaling forecast.

Code cell 31

# Your Solution
print("Starter: include setup, held-out loss, residuals, and uncertainty.")

Code cell 32

# Solution
checks = [
    "setup matches between fit runs and forecast run",
    "held-out loss is measured on a fixed evaluation set",
    "residuals are checked on withheld runs",
    "forecast includes uncertainty and a stop condition",
]
for check in checks:
    print("-", check)

Closing Reflection

Scaling laws are useful because they discipline guesses. They are dangerous when their assumptions, residuals, and uncertainty are hidden.