NotesMath for LLMs

Generative Models

Math for Specific Models / Generative Models

Notes

Generative models learn to sample from, score, or transform data distributions. They are the mathematical foundation behind language generation, image generation, latent-variable modeling, diffusion models, and synthetic data.

Overview

The goal is to model:

pθ(x)pdata(x).p_\theta(x)\approx p_\mathrm{data}(x).

Different model families choose different paths: autoregressive factorization, latent-variable bounds, adversarial games, invertible transformations, or iterative denoising.

Prerequisites

  • Probability, likelihood, KL divergence, and ELBO
  • Neural networks and optimization
  • Autoregressive sequence probability
  • Basic Gaussian sampling and matrix operations

Companion Notebooks

NotebookPurpose
theory.ipynbDemonstrates autoregressive likelihood, VAE reparameterization and KL, GAN losses, flow change of variables, diffusion noising, score updates, and FID intuition.
exercises.ipynbTen practice problems for generative-model objectives and diagnostics.

Learning Objectives

After this section, you should be able to:

  • Compare autoregressive models, VAEs, GANs, flows, diffusion models, and score models.
  • Compute autoregressive log likelihood.
  • Write and interpret the VAE ELBO and reparameterization trick.
  • Explain GAN minimax training and mode collapse.
  • Apply the change-of-variables formula for a simple flow.
  • Simulate diffusion noising and denoising loss.
  • Explain score matching and guidance.
  • Evaluate generative models with likelihood, diversity, sample quality, and FID-style statistics.

Table of Contents

  1. Generative Modeling Goal
  2. Autoregressive Models
  3. Variational Autoencoders
  4. GANs
  5. Normalizing Flows
  6. Diffusion Models
  7. Score-Based View
  8. Evaluation
  9. Applications and Tradeoffs
  10. Diagnostics

Model Family Map

FamilyLikelihoodSamplingMain strengthMain cost
AutoregressiveExactSequentialStrong likelihood and text generationSerial generation
VAELower boundFastLatent representationBlurry samples if decoder weak
GANUsually unavailableFastSharp samplesUnstable training, mode collapse
FlowExactFast-ishExact density and invertibilityArchitecture constraints
DiffusionVariational/score viewIterativeHigh sample quality and controllabilityMany denoising steps

1. Generative Modeling Goal

This part studies generative modeling goal as ways to model, sample, and evaluate data distributions.

SubtopicQuestionFormula
Data distributionlearn a model of how examples are generatedpdata(x)p_\mathrm{data}(x)
Model distributionchoose a parametric distribution familypθ(x)p_\theta(x)
Samplinggenerate new examples from the modelxpθx\sim p_\theta
Likelihoodsome models assign exact or approximate probabilitieslogpθ(x)\log p_\theta(x)
Conditional generationgenerate using labels, prompts, or contextpθ(xc)p_\theta(x\mid c)

1.1 Data distribution

Main idea. Learn a model of how examples are generated.

Core relation:

pdata(x)p_\mathrm{data}(x)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

1.2 Model distribution

Main idea. Choose a parametric distribution family.

Core relation:

pθ(x)p_\theta(x)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

1.3 Sampling

Main idea. Generate new examples from the model.

Core relation:

xpθx\sim p_\theta

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

1.4 Likelihood

Main idea. Some models assign exact or approximate probabilities.

Core relation:

logpθ(x)\log p_\theta(x)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

1.5 Conditional generation

Main idea. Generate using labels, prompts, or context.

Core relation:

pθ(xc)p_\theta(x\mid c)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

2. Autoregressive Models

This part studies autoregressive models as ways to model, sample, and evaluate data distributions.

SubtopicQuestionFormula
Chain rulefactorize data into conditional predictionsp(x1:T)=tp(xtx<t)p(x_{1:T})=\prod_tp(x_t\mid x_{<t})
Teacher forcingtrain conditionals with true previous tokenslogpθ(xtx<t)-\log p_\theta(x_t\mid x_{<t}^\star)
Sampling loopgenerate one variable at a timextpθ(x<t)x_t\sim p_\theta(\cdot\mid x_{<t})
Exact likelihoodlog likelihood is a sum of conditional log probabilitieslogp(x)=tlogp(xtx<t)\log p(x)=\sum_t\log p(x_t\mid x_{<t})
Costgeneration is sequential in the generated dimensionO(T)O(T) serial steps

2.1 Chain rule

Main idea. Factorize data into conditional predictions.

Core relation:

p(x1:T)=tp(xtx<t)p(x_{1:T})=\prod_tp(x_t\mid x_{<t})

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

2.2 Teacher forcing

Main idea. Train conditionals with true previous tokens.

Core relation:

logpθ(xtx<t)-\log p_\theta(x_t\mid x_{<t}^\star)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

2.3 Sampling loop

Main idea. Generate one variable at a time.

Core relation:

xtpθ(x<t)x_t\sim p_\theta(\cdot\mid x_{<t})

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

2.4 Exact likelihood

Main idea. Log likelihood is a sum of conditional log probabilities.

Core relation:

logp(x)=tlogp(xtx<t)\log p(x)=\sum_t\log p(x_t\mid x_{<t})

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

2.5 Cost

Main idea. Generation is sequential in the generated dimension.

Core relation:

O(T)$ serial steps

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

3. Variational Autoencoders

This part studies variational autoencoders as ways to model, sample, and evaluate data distributions.

SubtopicQuestionFormula
Latent variable modelgenerate data from a latent codepθ(x,z)=p(z)pθ(xz)p_\theta(x,z)=p(z)p_\theta(x\mid z)
Encoderapproximate posterior over latent variablesqϕ(zx)q_\phi(z\mid x)
Decodermap latent samples to data distributionpθ(xz)p_\theta(x\mid z)
ELBOoptimize a tractable lower boundlogpθ(x)Eq[logpθ(xz)]DKL(qϕ(zx)p(z))\log p_\theta(x)\ge E_q[\log p_\theta(x\mid z)]-D_\mathrm{KL}(q_\phi(z\mid x)\Vert p(z))
Reparameterizationsample differentiably from a Gaussian posteriorz=μ+σϵ, ϵN(0,I)z=\mu+\sigma\odot\epsilon,\ \epsilon\sim N(0,I)

3.1 Latent variable model

Main idea. Generate data from a latent code.

Core relation:

pθ(x,z)=p(z)pθ(xz)p_\theta(x,z)=p(z)p_\theta(x\mid z)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

3.2 Encoder

Main idea. Approximate posterior over latent variables.

Core relation:

qϕ(zx)q_\phi(z\mid x)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

3.3 Decoder

Main idea. Map latent samples to data distribution.

Core relation:

pθ(xz)p_\theta(x\mid z)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

3.4 ELBO

Main idea. Optimize a tractable lower bound.

Core relation:

logpθ(x)Eq[logpθ(xz)]DKL(qϕ(zx)p(z))\log p_\theta(x)\ge E_q[\log p_\theta(x\mid z)]-D_\mathrm{KL}(q_\phi(z\mid x)\Vert p(z))

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is the bridge from probabilistic latent variables to trainable variational autoencoders.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

3.5 Reparameterization

Main idea. Sample differentiably from a gaussian posterior.

Core relation:

z=μ+σϵ, ϵN(0,I)z=\mu+\sigma\odot\epsilon,\ \epsilon\sim N(0,I)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

4. GANs

This part studies gans as ways to model, sample, and evaluate data distributions.

SubtopicQuestionFormula
Generatormap noise to synthetic samplesx=Gθ(z)x=G_\theta(z)
Discriminatorscore whether samples look realDψ(x)(0,1)D_\psi(x)\in(0,1)
Minimax objectivetrain generator and discriminator adversariallyminGmaxDExpdatalogD(x)+Ezlog(1D(G(z)))\min_G\max_D E_{x\sim p_data}\log D(x)+E_z\log(1-D(G(z)))
Mode collapsegenerator may cover only part of the data distributionpGp_G misses modes
No direct likelihoodstandard GANs sample well but do not provide easy likelihoodslogpG(x)\log p_G(x) unavailable

4.1 Generator

Main idea. Map noise to synthetic samples.

Core relation:

x=Gθ(z)x=G_\theta(z)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

4.2 Discriminator

Main idea. Score whether samples look real.

Core relation:

Dψ(x)(0,1)D_\psi(x)\in(0,1)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

4.3 Minimax objective

Main idea. Train generator and discriminator adversarially.

Core relation:

minGmaxDExpdatalogD(x)+Ezlog(1D(G(z)))\min_G\max_D E_{x\sim p_data}\log D(x)+E_z\log(1-D(G(z)))

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. GANs trade likelihood for an adversarial learning signal that can produce sharp samples.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

4.4 Mode collapse

Main idea. Generator may cover only part of the data distribution.

Core relation:

p_G$ misses modes

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

4.5 No direct likelihood

Main idea. Standard gans sample well but do not provide easy likelihoods.

Core relation:

\log p_G(x)$ unavailable

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

5. Normalizing Flows

This part studies normalizing flows as ways to model, sample, and evaluate data distributions.

SubtopicQuestionFormula
Invertible maptransform simple noise into data with an invertible functionx=fθ(z)x=f_\theta(z)
Change of variablesdensity uses Jacobian determinant$\log p_X(x)=\log p_Z(z)+\log
Exact likelihoodflows can train by maximum likelihoodmaxθilogpθ(xi)\max_\theta\sum_i\log p_\theta(x_i)
Architectural constraintlayers must be invertible and have tractable JacobiansdetJ\det J tractable
Samplingsample z then apply fzpZ, x=fθ(z)z\sim p_Z,\ x=f_\theta(z)

5.1 Invertible map

Main idea. Transform simple noise into data with an invertible function.

Core relation:

x=fθ(z)x=f_\theta(z)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

5.2 Change of variables

Main idea. Density uses jacobian determinant.

Core relation:

logpX(x)=logpZ(z)+logdetf1/x\log p_X(x)=\log p_Z(z)+\log|\det \partial f^{-1}/\partial x|

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. Flows are powerful because they keep exact likelihood through invertible maps.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

5.3 Exact likelihood

Main idea. Flows can train by maximum likelihood.

Core relation:

maxθilogpθ(xi)\max_\theta\sum_i\log p_\theta(x_i)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

5.4 Architectural constraint

Main idea. Layers must be invertible and have tractable jacobians.

Core relation:

\det J$ tractable

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

5.5 Sampling

Main idea. Sample z then apply f.

Core relation:

zpZ, x=fθ(z)z\sim p_Z,\ x=f_\theta(z)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

6. Diffusion Models

This part studies diffusion models as ways to model, sample, and evaluate data distributions.

SubtopicQuestionFormula
Forward noisinggradually corrupt data with Gaussian noiseq(xtxt1)q(x_t\mid x_{t-1})
Closed-form noisingsample noisy x_t directly from x_0xt=αˉtx0+1αˉtϵx_t=\sqrt{\bar\alpha_t}x_0+\sqrt{1-\bar\alpha_t}\epsilon
Denoising modellearn to predict noise or clean dataϵθ(xt,t)\epsilon_\theta(x_t,t)
Training objectivecommon DDPM loss predicts added noiseEϵϵθ(xt,t)2E\Vert \epsilon-\epsilon_\theta(x_t,t)\Vert^2
Reverse samplingstart from noise and denoise step by steppθ(xt1xt)p_\theta(x_{t-1}\mid x_t)

6.1 Forward noising

Main idea. Gradually corrupt data with gaussian noise.

Core relation:

q(xtxt1)q(x_t\mid x_{t-1})

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

6.2 Closed-form noising

Main idea. Sample noisy x_t directly from x_0.

Core relation:

xt=αˉtx0+1αˉtϵx_t=\sqrt{\bar\alpha_t}x_0+\sqrt{1-\bar\alpha_t}\epsilon

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

6.3 Denoising model

Main idea. Learn to predict noise or clean data.

Core relation:

ϵθ(xt,t)\epsilon_\theta(x_t,t)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

6.4 Training objective

Main idea. Common ddpm loss predicts added noise.

Core relation:

Eϵϵθ(xt,t)2E\Vert \epsilon-\epsilon_\theta(x_t,t)\Vert^2

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. Diffusion training often becomes a supervised denoising problem.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

6.5 Reverse sampling

Main idea. Start from noise and denoise step by step.

Core relation:

pθ(xt1xt)p_\theta(x_{t-1}\mid x_t)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

7. Score-Based View

This part studies score-based view as ways to model, sample, and evaluate data distributions.

SubtopicQuestionFormula
Scoregradient of log density with respect to datas(x)=xlogp(x)s(x)=\nabla_x\log p(x)
Denoising score matchinglearn scores from noisy samplessθ(xt,t)s_\theta(x_t,t)
Langevin samplingmove samples toward high-density regions plus noisexx+ηsθ(x)+2ηξx\leftarrow x+\eta s_\theta(x)+\sqrt{2\eta}\xi
SDE viewcontinuous-time noising and denoising processesdx=f(x,t)dt+g(t)dwdx=f(x,t)dt+g(t)dw
Guidancecondition generation by modifying scores or logitssguided=suncond+w(scondsuncond)s_\mathrm{guided}=s_\mathrm{uncond}+w(s_\mathrm{cond}-s_\mathrm{uncond})

7.1 Score

Main idea. Gradient of log density with respect to data.

Core relation:

s(x)=xlogp(x)s(x)=\nabla_x\log p(x)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

7.2 Denoising score matching

Main idea. Learn scores from noisy samples.

Core relation:

sθ(xt,t)s_\theta(x_t,t)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

7.3 Langevin sampling

Main idea. Move samples toward high-density regions plus noise.

Core relation:

xx+ηsθ(x)+2ηξx\leftarrow x+\eta s_\theta(x)+\sqrt{2\eta}\xi

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

7.4 SDE view

Main idea. Continuous-time noising and denoising processes.

Core relation:

dx=f(x,t)dt+g(t)dwdx=f(x,t)dt+g(t)dw

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

7.5 Guidance

Main idea. Condition generation by modifying scores or logits.

Core relation:

sguided=suncond+w(scondsuncond)s_\mathrm{guided}=s_\mathrm{uncond}+w(s_\mathrm{cond}-s_\mathrm{uncond})

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. Guidance is one reason conditional diffusion models can trade diversity for prompt adherence.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

8. Evaluation

This part studies evaluation as ways to model, sample, and evaluate data distributions.

SubtopicQuestionFormula
Log likelihoodevaluate density when tractablelogpθ(x)\log p_\theta(x)
Sample qualityvisual or task-specific quality of generated examplesSqualityS_\mathrm{quality}
Diversitymodel should cover data modessupport(pθ)\mathrm{support}(p_\theta)
FID intuitioncompare feature means and covariancesμrμg2+Tr(Σr+Σg2(ΣrΣg)1/2)\Vert\mu_r-\mu_g\Vert^2+\mathrm{Tr}(\Sigma_r+\Sigma_g-2(\Sigma_r\Sigma_g)^{1/2})
Precision recallseparate sample fidelity from mode coverageprecision,recall\mathrm{precision},\mathrm{recall}

8.1 Log likelihood

Main idea. Evaluate density when tractable.

Core relation:

logpθ(x)\log p_\theta(x)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

8.2 Sample quality

Main idea. Visual or task-specific quality of generated examples.

Core relation:

SqualityS_\mathrm{quality}

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

8.3 Diversity

Main idea. Model should cover data modes.

Core relation:

support(pθ)\mathrm{support}(p_\theta)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

8.4 FID intuition

Main idea. Compare feature means and covariances.

Core relation:

μrμg2+Tr(Σr+Σg2(ΣrΣg)1/2)\Vert\mu_r-\mu_g\Vert^2+\mathrm{Tr}(\Sigma_r+\Sigma_g-2(\Sigma_r\Sigma_g)^{1/2})

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

8.5 Precision recall

Main idea. Separate sample fidelity from mode coverage.

Core relation:

precision,recall\mathrm{precision},\mathrm{recall}

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

9. Applications and Tradeoffs

This part studies applications and tradeoffs as ways to model, sample, and evaluate data distributions.

SubtopicQuestionFormula
Textautoregressive LMs dominate discrete sequence generationp(xtx<t)p(x_t\mid x_{<t})
Imagesdiffusion and autoregressive models are common high-quality image generatorsp(xc)p(x\mid c)
Representation learningVAEs learn latent spaceszz
Data augmentationsynthetic samples can help or hurt depending on qualityDrealDsyntheticD_\mathrm{real}\cup D_\mathrm{synthetic}
Safety and misusegeneration systems need provenance, filtering, and evaluationrisk\mathrm{risk}

9.1 Text

Main idea. Autoregressive lms dominate discrete sequence generation.

Core relation:

p(xtx<t)p(x_t\mid x_{<t})

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

9.2 Images

Main idea. Diffusion and autoregressive models are common high-quality image generators.

Core relation:

p(xc)p(x\mid c)

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

9.3 Representation learning

Main idea. Vaes learn latent spaces.

Core relation:

zz

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

9.4 Data augmentation

Main idea. Synthetic samples can help or hurt depending on quality.

Core relation:

DrealDsyntheticD_\mathrm{real}\cup D_\mathrm{synthetic}

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

9.5 Safety and misuse

Main idea. Generation systems need provenance, filtering, and evaluation.

Core relation:

risk\mathrm{risk}

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

10. Diagnostics

This part studies diagnostics as ways to model, sample, and evaluate data distributions.

SubtopicQuestionFormula
Likelihood versus samplesgood likelihood and good samples do not always alignlogp\log p versus visual quality
Latent traversalsinspect smoothness and meaning of latent directionsz+αvz+\alpha v
Mode coveragecheck diversity and rare classescoverage\mathrm{coverage}
Denoising curvestrack loss by timestepLtL_t
Ablationscompare architecture, objective, guidance, sampling steps, and conditioningΔS,ΔT\Delta S,\Delta T

10.1 Likelihood versus samples

Main idea. Good likelihood and good samples do not always align.

Core relation:

\log p$ versus visual quality

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

10.2 Latent traversals

Main idea. Inspect smoothness and meaning of latent directions.

Core relation:

z+αvz+\alpha v

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

10.3 Mode coverage

Main idea. Check diversity and rare classes.

Core relation:

coverage\mathrm{coverage}

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

10.4 Denoising curves

Main idea. Track loss by timestep.

Core relation:

LtL_t

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.

10.5 Ablations

Main idea. Compare architecture, objective, guidance, sampling steps, and conditioning.

Core relation:

ΔS,ΔT\Delta S,\Delta T

Generative models differ in what they make easy. Autoregressive models make likelihood straightforward but sampling serial. VAEs make latent variables explicit but optimize a bound. GANs can sample sharply but lack direct likelihood. Flows give exact likelihood but require invertible architectures. Diffusion models train by denoising and sample through iterative refinement.

Worked micro-example. In a Gaussian VAE encoder, qϕ(zx)=N(μ,σ2)q_\phi(z\mid x)=N(\mu,\sigma^2). Instead of sampling zz as an opaque random variable, write z=μ+σϵz=\mu+\sigma\epsilon with ϵN(0,1)\epsilon\sim N(0,1). This keeps randomness outside the parameters and lets gradients flow through μ\mu and σ\sigma.

Implementation check. Always say what is being optimized: exact likelihood, ELBO, adversarial loss, score matching, or denoising loss. Different objectives imply different diagnostics.

AI connection. This is a practical generative-model control variable.

Common mistake. Do not compare generative models only by one metric. Likelihood, visual quality, diversity, controllability, sampling cost, and safety behavior can disagree.


Practice Exercises

  1. Compute autoregressive log likelihood.
  2. Compute a VAE Gaussian KL to a standard normal prior.
  3. Apply the reparameterization trick.
  4. Compute GAN discriminator and generator losses.
  5. Apply a 1D flow change-of-variables formula.
  6. Simulate one diffusion noising step.
  7. Compute a denoising MSE loss.
  8. Take one score-based Langevin update.
  9. Compute a simplified FID-style distance.
  10. Write a generative-model debugging checklist.

Why This Matters for AI

Modern AI is largely generative: LLMs generate text, diffusion models generate images, VAEs and flows model latent structure, and synthetic data systems generate training examples. Understanding the objective behind each model prevents shallow comparisons.

Bridge to CNN and Convolution Math

Image generators often use convolutional or attention-based backbones. The next section studies convolution math, a key building block for many vision generators and discriminators.

References