Part 3Math for LLMs

Stochastic Processes: Part 3 - Appendix V Numerical Methods For Sdes To Appendix X Glossary

Probability Theory / Stochastic Processes

Private notes
0/8000

Notes stay private to your browser until account sync is configured.

Part 3
5 min read8 headingsSplit lesson page

Lesson overview | Previous part | Lesson overview

Stochastic Processes: Appendix V: Numerical Methods for SDEs to Appendix X: Glossary

Appendix V: Numerical Methods for SDEs

V.1 Euler-Maruyama Method

The simplest numerical scheme for dX=b(X,t)dt+σ(X,t)dBtdX = b(X,t)\,dt + \sigma(X,t)\,dB_t:

Xn+1=Xn+b(Xn,tn)Δt+σ(Xn,tn)ΔBnX_{n+1} = X_n + b(X_n, t_n)\Delta t + \sigma(X_n, t_n)\Delta B_n

where ΔBnN(0,Δt)\Delta B_n \sim \mathcal{N}(0, \Delta t).

Convergence: Euler-Maruyama has strong order 1/2 (pathwise error Δt\sim \sqrt{\Delta t}) and weak order 1 (distributional error Δt\sim \Delta t). Compare to Euler method for ODEs which has order 1 for pathwise error - the Δt\sqrt{\Delta t} order reflects the irregularity of BM.

V.2 Milstein Method

The Milstein method improves strong convergence to order 1 by adding an Ito correction:

Xn+1=Xn+bΔt+σΔBn+12σσ[(ΔBn)2Δt]X_{n+1} = X_n + b\Delta t + \sigma\Delta B_n + \frac{1}{2}\sigma\sigma'\left[(\Delta B_n)^2 - \Delta t\right]

The extra term 12σσ[(ΔBn)2Δt]\frac{1}{2}\sigma\sigma'[(\Delta B_n)^2 - \Delta t] uses the Ito formula's second-order term. For scalar SDEs, Milstein achieves the same convergence order as the trapezoidal rule for ODEs.

V.3 DDPM as Euler-Maruyama

The DDPM update xt=1βtxt1+βtϵtx_t = \sqrt{1-\beta_t}x_{t-1} + \sqrt{\beta_t}\epsilon_t is exactly the Euler-Maruyama discretisation of the VP-SDE dx=12β(t)xdt+β(t)dBtdx = -\frac{1}{2}\beta(t)x\,dt + \sqrt{\beta(t)}\,dB_t with step size Δt=1/T\Delta t = 1/T:

xt=xt1βt2xt11+βtϵt=(1βt/2)xt1+βtϵt1βtxt1+βtϵtx_t = x_{t-1} - \frac{\beta_t}{2}x_{t-1} \cdot 1 + \sqrt{\beta_t}\epsilon_t = (1-\beta_t/2)x_{t-1} + \sqrt{\beta_t}\epsilon_t \approx \sqrt{1-\beta_t}x_{t-1} + \sqrt{\beta_t}\epsilon_t

where the approximation 1βt1βt/2\sqrt{1-\beta_t} \approx 1 - \beta_t/2 holds for small βt\beta_t. DDIM corresponds to using an implicit (backward Euler) discretisation of the probability flow ODE, which allows larger step sizes.


Appendix W: Bridge Processes and Conditional Diffusions

W.1 Brownian Bridge

A Brownian bridge from aa to bb on [0,T][0,T] is a Brownian motion conditioned to be at bb at time TT:

XtBB=a+(ba)tT+BttTBTX_t^{\text{BB}} = a + (b-a)\frac{t}{T} + B_t - \frac{t}{T}B_T

This is a Gaussian process with E[Xt]=a+(ba)t/T\mathbb{E}[X_t] = a + (b-a)t/T and Cov(Xs,Xt)=s(Tt)/T\text{Cov}(X_s, X_t) = s(T-t)/T for sts \leq t.

For AI: Stochastic interpolation (Albergo & Vanden-Eijnden, 2023) uses Brownian bridges to construct generative models that interpolate between data and noise along curved paths, rather than straight (linear interpolation) or OU (DDPM) paths. Flow matching (Lipman et al., 2022) uses deterministic bridges (straight-line paths) for efficient training.

W.2 Conditional Score Function

Given a forward process q(xtx0)q(x_t|x_0), the conditional score is:

xtlogq(xtx0)=(xtαˉtx0)1αˉt=ϵ1αˉt\nabla_{x_t}\log q(x_t|x_0) = \frac{-(x_t - \sqrt{\bar\alpha_t}x_0)}{1-\bar\alpha_t} = \frac{-\epsilon}{\sqrt{1-\bar\alpha_t}}

where ϵ\epsilon is the noise added in the forward process. The marginal score satisfies:

xtlogq(xt)=E ⁣[xtlogq(xtx0)xt]\nabla_{x_t}\log q(x_t) = \mathbb{E}\!\left[\nabla_{x_t}\log q(x_t|x_0)\,\big|\,x_t\right]

This is a conditional expectation - a Doob martingale evaluated at tt! The diffusion model learns to estimate E[ϵxt]\mathbb{E}[\epsilon|x_t], which by Tweedie's formula equals the MMSE denoiser. The entire diffusion model training objective is a stochastic process estimation problem formulated in martingale language.


End of Section06 Stochastic Processes.

<- Back to Chapter 6: Probability Theory | Next: Markov Chains ->


Appendix X: Glossary

TermDefinition
Adapted processProcess XtX_t that is Ft\mathcal{F}_t-measurable at each tt
Autocorrelation function (ACF)ρ(τ)=Cov(Xt,Xt+τ)/Var(Xt)\rho(\tau) = \text{Cov}(X_t, X_{t+\tau})/\text{Var}(X_t)
Azuma's inequalityConcentration bound for martingales with bounded differences
Brownian motionContinuous Gaussian process with independent stationary increments
CadlagRight-continuous with left limits (French acronym)
Compensated PoissonN(t)λtN(t) - \lambda t - a martingale
Doob martingale$M_k = \mathbb{E}[f
Ergodic theoremTime average \to ensemble average a.s.
FiltrationIncreasing family of σ\sigma-algebras: FsFt\mathcal{F}_s \subseteq \mathcal{F}_t for sts \leq t
Gaussian process (GP)Process where every finite marginal is jointly Gaussian
Ito integralStochastic integral HtdBt\int H_t\,dB_t adapted to BM filtration
Ito's lemmaChain rule for BM: df=fdX+12f(dX)2df = f'dX + \frac{1}{2}f''(dX)^2
KernelPositive semidefinite function k(x,x)k(x,x') - covariance of GP
Local martingaleStopped process MtτnM_{t\wedge\tau_n} is a martingale for each nn
Martingale$\mathbb{E}[M_t
Martingale differenceDk=MkMk1D_k = M_k - M_{k-1} with $\mathbb{E}[D_k
OU processdX=θ(Xμ)dt+σdBdX = -\theta(X-\mu)\,dt + \sigma\,dB - mean-reverting Gaussian process
Optional Stopping TheoremE[Mτ]=E[M0]\mathbb{E}[M_\tau] = \mathbb{E}[M_0] under appropriate conditions
Poisson processCounting process with iid exponential inter-arrival times
Quadratic variation[B]t=t[B]_t = t for BM - central property of stochastic integration
Score functionxlogpt(x)\nabla_x \log p_t(x) - gradient of log-density
Stationary processDistribution invariant under time shifts
Stopping timeτ\tau with {τt}Ft\{\tau \leq t\} \in \mathcal{F}_t - no look-ahead
Submartingale$\mathbb{E}[M_t
Supermartingale$\mathbb{E}[M_t
ThinningPoisson process subsampled with prob pp is Poisson(λp\lambda p)
Uniform integrability$\sup_t\mathbb{E}[
VP-SDEVariance-preserving SDE - DDPM forward process continuum
Wide-sense stationaryConstant mean + covariance depends only on lag
Wiener measureProbability measure on C([0,))C([0,\infty)) defining BM paths

Skill Check

Test this lesson

Answer 4 quick questions to lock in the lesson and feed your adaptive practice queue.

--
Score
0/4
Answered
Not attempted
Status
1

Which module does this lesson belong to?

2

Which section is covered in this lesson content?

3

Which term is most central to this lesson?

4

What is the best way to use this lesson for real learning?

Your answers save locally first, then sync when account storage is available.
Practice queue