Exercises Notebook
Converted from
exercises.ipynbfor web reading.
RNN and LSTM Math: Exercises
Ten exercises cover the recurring mechanics: hidden updates, sequence probability, BPTT scale, clipping, LSTM and GRU gates, masks, shapes, attention context, and diagnostics.
Code cell 2
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
try:
import seaborn as sns
sns.set_theme(style="whitegrid", palette="colorblind")
HAS_SNS = True
except ImportError:
plt.style.use("seaborn-v0_8-whitegrid")
HAS_SNS = False
mpl.rcParams.update({
"figure.figsize": (10, 6),
"figure.dpi": 120,
"font.size": 13,
"axes.titlesize": 15,
"axes.labelsize": 13,
"xtick.labelsize": 11,
"ytick.labelsize": 11,
"legend.fontsize": 11,
"legend.framealpha": 0.85,
"lines.linewidth": 2.0,
"axes.spines.top": False,
"axes.spines.right": False,
"savefig.bbox": "tight",
"savefig.dpi": 150,
})
np.random.seed(42)
print("Plot setup complete.")
Exercise 1: Vanilla RNN update
Compute one tanh hidden-state update.
Code cell 4
# Your Solution
x = np.array([1.0, -1.0])
h_prev = np.array([0.5, 0.0])
W_xh = np.eye(2)
W_hh = 0.5 * np.eye(2)
print("Starter: h=tanh(W_xh@x + W_hh@h_prev).")
Code cell 5
# Solution
x = np.array([1.0, -1.0])
h_prev = np.array([0.5, 0.0])
W_xh = np.eye(2)
W_hh = 0.5 * np.eye(2)
h = np.tanh(W_xh @ x + W_hh @ h_prev)
print("h:", h)
Exercise 2: Sequence probability
Multiply conditional probabilities for a sequence.
Code cell 7
# Your Solution
probs = np.array([0.8, 0.6, 0.5])
print("Starter: product for probability, sum logs for log probability.")
Code cell 8
# Solution
probs = np.array([0.8, 0.6, 0.5])
p = probs.prod()
logp = np.log(probs).sum()
print("p:", p, "logp:", logp)
Exercise 3: Gradient product
Compute scalar gradient scale over 10 steps.
Code cell 10
# Your Solution
scale = 0.8
steps = 10
print("Starter: scale ** steps.")
Code cell 11
# Solution
scale = 0.8
steps = 10
print("gradient scale:", scale ** steps)
Exercise 4: Gradient clipping
Clip vector [6,8] to norm 5.
Code cell 13
# Your Solution
g = np.array([6.0, 8.0])
print("Starter: multiply by min(1, 5/norm(g)).")
Code cell 14
# Solution
g = np.array([6.0, 8.0])
scale = min(1.0, 5.0 / np.linalg.norm(g))
clipped = g * scale
print("clipped:", clipped, "norm:", np.linalg.norm(clipped))
Exercise 5: LSTM cell update
Compute c_t from gates and candidate.
Code cell 16
# Your Solution
f = np.array([0.9, 0.2])
i = np.array([0.1, 0.8])
c_prev = np.array([1.0, -1.0])
cand = np.array([0.5, 0.25])
print("Starter: c=f*c_prev + i*cand.")
Code cell 17
# Solution
f = np.array([0.9, 0.2])
i = np.array([0.1, 0.8])
c_prev = np.array([1.0, -1.0])
cand = np.array([0.5, 0.25])
c = f * c_prev + i * cand
print("c:", c)
Exercise 6: GRU update
Blend old state and candidate with update gate.
Code cell 19
# Your Solution
z = np.array([0.25, 0.75])
h_prev = np.array([1.0, -1.0])
h_tilde = np.array([0.0, 0.5])
print("Starter: h=(1-z)*h_prev + z*h_tilde.")
Code cell 20
# Solution
z = np.array([0.25, 0.75])
h_prev = np.array([1.0, -1.0])
h_tilde = np.array([0.0, 0.5])
h = (1 - z) * h_prev + z * h_tilde
print("h:", h)
Exercise 7: Masked loss
Average losses over real tokens only.
Code cell 22
# Your Solution
losses = np.array([0.4, 0.6, 0.0])
mask = np.array([1, 1, 0])
print("Starter: sum(losses*mask)/sum(mask).")
Code cell 23
# Solution
losses = np.array([0.4, 0.6, 0.0])
mask = np.array([1, 1, 0])
masked = (losses * mask).sum() / mask.sum()
print("masked loss:", masked)
Exercise 8: Task shapes
Identify output shape for many-to-many logits.
Code cell 25
# Your Solution
B, T, V = 3, 5, 100
print("Starter: logits shape is (B,T,V).")
Code cell 26
# Solution
B, T, V = 3, 5, 100
logits_shape = (B, T, V)
print("logits shape:", logits_shape)
Exercise 9: Attention context
Compute attention context from weights and encoder states.
Code cell 28
# Your Solution
weights = np.array([0.2, 0.3, 0.5])
encoder = np.array([[1.,0.], [0.,1.], [1.,1.]])
print("Starter: context = weights @ encoder.")
Code cell 29
# Solution
weights = np.array([0.2, 0.3, 0.5])
encoder = np.array([[1.,0.], [0.,1.], [1.,1.]])
context = weights @ encoder
print("context:", context)
Exercise 10: Debug checklist
Write four RNN diagnostics.
Code cell 31
# Your Solution
print("Starter: include masks, gradient norms, gate stats, and length tests.")
Code cell 32
# Solution
checks = [
"padding masks are applied before averaging loss",
"gradient norms are tracked through time",
"LSTM/GRU gate saturation is monitored",
"short and long sequence metrics are reported separately",
]
for check in checks:
print("-", check)
Closing Reflection
RNNs teach the core sequence-learning tension: memory needs long paths, but gradients dislike long paths. Gates and attention are two different answers to that tension.