Exercises NotebookMath for LLMs

Neural Networks

Math for Specific Models / Neural Networks

Run notebook
Exercises Notebook

Exercises Notebook

Converted from exercises.ipynb for web reading.

Neural Networks: Exercises

Ten exercises cover forward passes, activations, cross-entropy, backprop shapes, initialization, dropout, normalization, and diagnostics.

Code cell 2

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

try:
    import seaborn as sns
    sns.set_theme(style="whitegrid", palette="colorblind")
    HAS_SNS = True
except ImportError:
    plt.style.use("seaborn-v0_8-whitegrid")
    HAS_SNS = False

mpl.rcParams.update({
    "figure.figsize":    (10, 6),
    "figure.dpi":         120,
    "font.size":           13,
    "axes.titlesize":      15,
    "axes.labelsize":      13,
    "xtick.labelsize":     11,
    "ytick.labelsize":     11,
    "legend.fontsize":     11,
    "legend.framealpha":   0.85,
    "lines.linewidth":      2.0,
    "axes.spines.top":     False,
    "axes.spines.right":   False,
    "savefig.bbox":       "tight",
    "savefig.dpi":         150,
})
np.random.seed(42)
print("Plot setup complete.")

Exercise 1: Affine layer

Compute z=Wx+b.

Code cell 4

# Your Solution
x = np.array([1.0, 2.0])
W = np.array([[1.0, 0.0], [0.5, -1.0]])
b = np.array([0.1, 0.2])
print("Starter: W@x+b.")

Code cell 5

# Solution
x = np.array([1.0, 2.0])
W = np.array([[1.0, 0.0], [0.5, -1.0]])
b = np.array([0.1, 0.2])
z = W @ x + b
print("z:", z)

Exercise 2: ReLU

Apply ReLU and derivative.

Code cell 7

# Your Solution
z = np.array([-1.0, 0.0, 2.0])
print("Starter: max(0,z), derivative z>0.")

Code cell 8

# Solution
z = np.array([-1.0, 0.0, 2.0])
print("relu:", np.maximum(0, z))
print("drelu:", (z > 0).astype(float))

Exercise 3: Two-layer forward

Compute a tiny two-layer network.

Code cell 10

# Your Solution
x = np.array([1.0, -1.0])
W1 = np.eye(2)
W2 = np.array([[1.0, 1.0]])
print("Starter: h=ReLU(W1@x), y=W2@h.")

Code cell 11

# Solution
x = np.array([1.0, -1.0])
W1 = np.eye(2)
W2 = np.array([[1.0, 1.0]])
h = np.maximum(0, W1 @ x)
y = W2 @ h
print("h:", h, "y:", y)

Exercise 4: Softmax CE

Compute softmax cross-entropy for target 0.

Code cell 13

# Your Solution
logits = np.array([2.0, 1.0, 0.0])
target = 0
print("Starter: softmax then -log p[target].")

Code cell 14

# Solution
logits = np.array([2.0, 1.0, 0.0])
target = 0
e = np.exp(logits - logits.max())
p = e / e.sum()
loss = -np.log(p[target])
print("p:", p, "loss:", loss)

Exercise 5: Affine gradient

Compute dW for one example.

Code cell 16

# Your Solution
dZ = np.array([2.0, -1.0])
x = np.array([3.0, 4.0])
print("Starter: outer(dZ,x).")

Code cell 17

# Solution
dZ = np.array([2.0, -1.0])
x = np.array([3.0, 4.0])
dW = np.outer(dZ, x)
print("dW:\n", dW)

Exercise 6: Gradient check idea

Compute central difference for f(w)=w^2.

Code cell 19

# Your Solution
w = 3.0
eps = 1e-5
print("Starter: (f(w+eps)-f(w-eps))/(2eps).")

Code cell 20

# Solution
w = 3.0
eps = 1e-5
f = lambda a: a**2
num = (f(w + eps) - f(w - eps)) / (2 * eps)
print("numeric gradient:", num)

Exercise 7: Initialization scales

Compute Xavier and He std.

Code cell 22

# Your Solution
fan_in, fan_out = 100, 200
print("Starter: sqrt(2/(fan_in+fan_out)) and sqrt(2/fan_in).")

Code cell 23

# Solution
fan_in, fan_out = 100, 200
xavier = np.sqrt(2 / (fan_in + fan_out))
he = np.sqrt(2 / fan_in)
print("xavier:", xavier, "he:", he)

Exercise 8: Dropout

Apply inverted dropout for a given mask.

Code cell 25

# Your Solution
h = np.array([1.0, 1.0, 1.0])
mask = np.array([1.0, 0.0, 1.0])
p_drop = 1/3
print("Starter: h*mask/(1-p_drop).")

Code cell 26

# Solution
h = np.array([1.0, 1.0, 1.0])
mask = np.array([1.0, 0.0, 1.0])
p_drop = 1/3
out = h * mask / (1 - p_drop)
print("out:", out)

Exercise 9: LayerNorm

Normalize one vector.

Code cell 28

# Your Solution
x = np.array([1.0, 2.0, 3.0])
print("Starter: subtract mean and divide by std.")

Code cell 29

# Solution
x = np.array([1.0, 2.0, 3.0])
y = (x - x.mean()) / np.sqrt(x.var() + 1e-5)
print("y:", y)

Exercise 10: Checklist

Write four neural-network diagnostics.

Code cell 31

# Your Solution
print("Starter: include shapes, tiny overfit, activations, gradients.")

Code cell 32

# Solution
checks = [
    "forward tensor shapes are correct",
    "model can overfit a tiny batch",
    "activation statistics are healthy",
    "gradient norms are tracked by layer",
]
for check in checks:
    print("-", check)

Closing Reflection

Neural network debugging is mostly checking the health of values and gradients between the input and the loss.