Part 1Math for LLMs

Discrete Fourier Transform and FFT: Part 1 - Intuition To 6 Frequency Resolution And Zero Padding

Fourier Analysis and Signal Processing / Discrete Fourier Transform and FFT

Private notes
0/8000

Notes stay private to your browser until account sync is configured.

Part 1
30 min read18 headingsSplit lesson page

Lesson overview | Lesson overview | Next part

Discrete Fourier Transform and FFT: Part 1: Intuition to 6. Frequency Resolution and Zero-Padding

1. Intuition

1.1 From Continuous to Discrete: Sampling a Spectrum

Every practical computation with signals must at some point confront a fundamental constraint: computers cannot store or process continuous functions. What they can store is a finite sequence of numbers - samples of a signal taken at discrete time instants. The passage from the continuous world of the Fourier Transform to the discrete world of the DFT is therefore not a mathematical nicety but a computational necessity.

Suppose we record NN samples of a signal x(t)x(t) at a sampling rate of fsf_s samples per second:

x[n]=x(nTs),Ts=1fs,n=0,1,,N1x[n] = x(n \cdot T_s), \quad T_s = \frac{1}{f_s}, \quad n = 0, 1, \ldots, N-1

The total observation time is T=NTsT = N \cdot T_s. What frequencies can we see in this finite record? Two constraints emerge immediately:

  1. Highest observable frequency (Nyquist): any frequency above fs/2f_s/2 will be indistinguishable from a lower frequency due to aliasing - two distinct sinusoids will produce identical sample sequences.
  2. Frequency resolution: the smallest frequency difference we can distinguish is Δf=fs/N=1/T\Delta f = f_s / N = 1/T - the reciprocal of the observation duration.

The DFT honours both constraints. It examines the signal at exactly NN equally-spaced frequency bins:

ξk=kfsN,k=0,1,,N1\xi_k = \frac{k \cdot f_s}{N}, \quad k = 0, 1, \ldots, N-1

covering one full period [0,fs)[0, f_s) of the periodic spectrum. The first N/2N/2 bins correspond to positive frequencies [0,fs/2)[0, f_s/2); the remaining N/2N/2 bins correspond to negative frequencies (fs/2,0)(-f_s/2, 0) wrapped around to [fs/2,fs)[f_s/2, f_s). This wrapping is the discrete manifestation of the Nyquist limit.

For AI: Every neural network that processes time-series data - speech, ECG, seismic, financial - ultimately works with sampled signals. Understanding the frequency axis setup of the DFT is essential for interpreting what spectral features (mel filterbanks in Whisper, power spectral density in EEG classifiers) actually represent. Misinterpreting negative-frequency bins is a common source of subtle bugs in spectrogram preprocessing.

1.2 What the DFT Does Geometrically

The space of NN-point complex sequences CN\mathbb{C}^N has dimension NN. The DFT provides a complete orthogonal basis for this space - the Fourier basis - consisting of NN complex sinusoids at the frequencies ξk=k/N\xi_k = k/N:

fk=[1ωNkωN2kωN(N1)k],ωN=e2πi/N,k=0,1,,N1\mathbf{f}_k = \begin{bmatrix} 1 \\ \omega_N^k \\ \omega_N^{2k} \\ \vdots \\ \omega_N^{(N-1)k} \end{bmatrix}, \quad \omega_N = e^{2\pi i / N}, \quad k = 0, 1, \ldots, N-1

Each basis vector fk\mathbf{f}_k is a sampled complex sinusoid at frequency k/Nk/N cycles per sample. The DFT coefficient X[k]X[k] is the inner product of the signal x\mathbf{x} with the kk-th basis vector:

X[k]=x,fk=n=0N1x[n]ωNnk=n=0N1x[n]ωNnkX[k] = \langle \mathbf{x}, \mathbf{f}_k \rangle = \sum_{n=0}^{N-1} x[n]\, \overline{\omega_N^{nk}} = \sum_{n=0}^{N-1} x[n]\, \omega_N^{-nk}

The DFT is therefore a change of basis from the standard basis (time domain) to the Fourier basis (frequency domain). The magnitude X[k]|X[k]| measures how strongly the signal oscillates at frequency k/Nk/N cycles per sample; the phase X[k]\angle X[k] measures the phase of that oscillation.

The inverse DFT reconstructs the signal by superimposing all NN sinusoids with their computed amplitudes and phases:

x[n]=1Nk=0N1X[k]ωNnkx[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k]\, \omega_N^{nk}

The factor 1/N1/N comes from the normalization of the basis - the fk\mathbf{f}_k have squared norm fk2=N\lVert \mathbf{f}_k \rVert^2 = N, not 1. Using the orthonormal basis fk/N\mathbf{f}_k / \sqrt{N}, both forward and inverse transforms carry a 1/N1/\sqrt{N} factor. Different software packages use different conventions; always check.

Geometric picture: if you think of the signal as a vector in CN\mathbb{C}^N, the DFT rotates this vector into a new coordinate system. The new coordinate axes are complex sinusoids. The DFT matrix FNF_N is the rotation matrix (more precisely, a scaled unitary matrix). The operation X=FNx\mathbf{X} = F_N \mathbf{x} is a matrix-vector product; naive evaluation takes O(N2)O(N^2) operations. The FFT computes this same product in O(NlogN)O(N \log N).

1.3 Why O(N log N) Changed the World

To appreciate the scale of the FFT's impact, consider the arithmetic:

NNNaive DFT (N2N^2)FFT (Nlog2NN \log_2 N)Speedup
644,09638411
1,0241,048,57610,240102
65,5364.3×1094.3 \times 10^91.0×1061.0 \times 10^64,369
10610^6101210^{12}2.0×1072.0 \times 10^750,000

Before the FFT, computing the spectrum of a 1-second audio clip sampled at 44.1 kHz was infeasible in real time. After the FFT, it takes microseconds. The entire modern infrastructure of digital audio, wireless communications (OFDM), radar, MRI, seismology, and antenna design rests on this algorithmic breakthrough.

The secret is that the DFT matrix FNF_N has enormous structure: it is a Vandermonde matrix with entries that are powers of NN-th roots of unity, and these entries satisfy recursive relationships that allow massive cancellation of work through a divide-and-conquer decomposition.

For AI (2026): Long-context language models face a quadratic bottleneck in self-attention: processing a 100K-token context with full attention requires 1010\sim 10^{10} operations. The FFT offers an alternative: if a sequence operation can be formulated as a convolution, it can be computed in O(NlogN)O(N \log N) using the FFT. This is the foundation of FNet (Lee-Thorp et al., 2022), FlashFFTConv (Dao et al., 2023), S4 (Gu et al., 2022), and the Fourier Neural Operator (Li et al., 2021). The O(NlogN)O(N \log N) vs O(N2)O(N^2) complexity gap, already dramatic at N=103N = 10^3, becomes existential at N=105N = 10^5.

1.4 Historical Timeline

The story of the FFT is a case study in mathematical rediscovery and the transformative effect of the right algorithm at the right moment.

1805 - Carl Friedrich Gauss derived what we now recognize as the FFT algorithm while computing asteroid orbits. His manuscript lay unpublished and largely unknown.

1903 - Carl Runge independently derived a related fast algorithm for trigonometric interpolation.

1942 - Danielson and Lanczos published a recursive factorization of the DFT based on even-odd splitting, establishing the core mathematical structure.

1965 - James Cooley and John Tukey published "An Algorithm for the Machine Calculation of Complex Fourier Series," the paper that launched the digital signal processing revolution. Their algorithm was motivated by Tukey's desire to detect Soviet nuclear tests through seismic monitoring. The paper, just two pages, is one of the most cited in the history of applied mathematics.

1966 - Gentleman and Sande introduced the "decimation in frequency" variant; Bergland gave a comprehensive tutorial; the algorithm became standard in every engineering discipline within a decade.

1984 - FFTW - Frigo and Johnson developed FFTW ("Fastest Fourier Transform in the West"), an adaptive library that benchmarks and selects the optimal algorithm for any NN and hardware. FFTW ships in NumPy, SciPy, and is the backbone of essentially all scientific computing FFT pipelines.

2021 - Fourier Neural Operator (Li et al.) applies the DFT inside a neural network layer to learn solution operators for PDEs in O(NlogN)O(N \log N).

2022 - Monarch matrices (Dao et al.) parameterize FFT-like structured transforms as products of sparse butterfly matrices, enabling learnable near-FFT computations in transformer architectures.


2. Formal Definitions

2.1 The N-Point DFT

Definition (DFT). Let x=(x[0],x[1],,x[N1])CN\mathbf{x} = (x[0], x[1], \ldots, x[N-1]) \in \mathbb{C}^N be a finite sequence. The NN-point Discrete Fourier Transform of x\mathbf{x} is the sequence X=(X[0],X[1],,X[N1])CN\mathbf{X} = (X[0], X[1], \ldots, X[N-1]) \in \mathbb{C}^N defined by:

X[k]=n=0N1x[n]ωNnk,k=0,1,,N1X[k] = \sum_{n=0}^{N-1} x[n]\, \omega_N^{-nk}, \quad k = 0, 1, \ldots, N-1

where ωN=e2πi/N\omega_N = e^{2\pi i / N} is the primitive NN-th root of unity.

The complex number ωNnk=e2πink/N\omega_N^{-nk} = e^{-2\pi i nk/N} is a sampled complex sinusoid at frequency k/Nk/N cycles per sample, evaluated at time nn. The DFT coefficient X[k]X[k] measures the complex amplitude (magnitude and phase) of the frequency-kk component of x\mathbf{x}.

Sign convention. The minus sign in ωNnk\omega_N^{-nk} is the physics/engineering convention (analysis kernel e2πie^{-2\pi i}). Some mathematical texts use ωN+nk\omega_N^{+nk} (positive sign) for the forward transform; this merely relabels forward and inverse. We follow NumPy/SciPy: forward DFT has the - sign.

Roots of unity. The NN values {ωN0,ωN1,,ωNN1}\{\omega_N^0, \omega_N^1, \ldots, \omega_N^{N-1}\} are the NN-th roots of unity - equally spaced points on the unit circle in C\mathbb{C}, starting at 1 and rotating counterclockwise by 2π/N2\pi/N at each step. They satisfy:

n=0N1ωNnk={Nif k0(modN)0otherwise\sum_{n=0}^{N-1} \omega_N^{nk} = \begin{cases} N & \text{if } k \equiv 0 \pmod{N} \\ 0 & \text{otherwise} \end{cases}

This orthogonality of roots of unity is the key identity underlying all DFT properties and the inversion formula.

Standard examples for small NN:

N=2N=2: ω2=eiπ=1\omega_2 = e^{i\pi} = -1. The 2-point DFT is:

X[0]=x[0]+x[1],X[1]=x[0]x[1]X[0] = x[0] + x[1], \quad X[1] = x[0] - x[1]

This is the Hadamard/Walsh transform in 2D - just a sum and difference.

N=4N=4: ω4=eiπ/2=i\omega_4 = e^{i\pi/2} = i. The 4-point DFT:

X[0]=x[0]+x[1]+x[2]+x[3],X[1]=x[0]ix[1]x[2]+ix[3]X[0] = x[0]+x[1]+x[2]+x[3], \quad X[1] = x[0] - ix[1] - x[2] + ix[3] X[2]=x[0]x[1]+x[2]x[3],X[3]=x[0]+ix[1]x[2]ix[3]X[2] = x[0]-x[1]+x[2]-x[3], \quad X[3] = x[0]+ix[1]-x[2]-ix[3]

Non-example: An infinite sequence x[n]x[n] for nZn \in \mathbb{Z} does NOT have a DFT - you must first window or truncate it to NN points. Applying the DFT directly to an infinite sequence is a category error; what you would compute is the zz-transform or DTFT (Discrete-Time Fourier Transform), which is distinct from the DFT.

2.2 Inverse DFT and Normalization Conventions

Definition (IDFT). The Inverse Discrete Fourier Transform is:

x[n]=1Nk=0N1X[k]ωNnk,n=0,1,,N1x[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k]\, \omega_N^{nk}, \quad n = 0, 1, \ldots, N-1

Verification. Substitute the DFT into the IDFT:

1NkX[k]ωNnk=1Nk(mx[m]ωNmk)ωNnk=mx[m]1NkωN(nm)k=mx[m]δ[nm]=x[n]\frac{1}{N}\sum_k X[k]\omega_N^{nk} = \frac{1}{N}\sum_k \left(\sum_m x[m]\omega_N^{-mk}\right)\omega_N^{nk} = \sum_m x[m] \cdot \frac{1}{N}\sum_k \omega_N^{(n-m)k} = \sum_m x[m]\, \delta[n-m] = x[n]

using the orthogonality of roots of unity in the penultimate step.

Three normalization conventions (all valid, software-dependent):

ConventionForward DFTInverse DFTUsed by
Engineering (default)X[k]=nx[n]ωNnkX[k] = \sum_n x[n]\omega_N^{-nk}x[n]=1NkX[k]ωNnkx[n] = \frac{1}{N}\sum_k X[k]\omega_N^{nk}NumPy, SciPy, MATLAB
Symmetric (unitary)X[k]=1Nnx[n]ωNnkX[k] = \frac{1}{\sqrt{N}}\sum_n x[n]\omega_N^{-nk}x[n]=1NkX[k]ωNnkx[n] = \frac{1}{\sqrt{N}}\sum_k X[k]\omega_N^{nk}Some textbooks
Inverse-normalizedX[k]=1Nnx[n]ωNnkX[k] = \frac{1}{N}\sum_n x[n]\omega_N^{-nk}x[n]=kX[k]ωNnkx[n] = \sum_k X[k]\omega_N^{nk}Rare; avoid

For AI: Always check which convention a library uses. NumPy uses engineering convention: np.fft.fft has no 1/N1/N factor, np.fft.ifft has 1/N1/N. PyTorch's torch.fft.fft matches NumPy. When mixing libraries or implementing from scratch, convention mismatch is a common silent bug.

2.3 The DFT Matrix

The DFT is a linear map X=FNx\mathbf{X} = F_N \mathbf{x}, where the DFT matrix FNCN×NF_N \in \mathbb{C}^{N \times N} has entries:

(FN)kn=ωNkn,k,n=0,1,,N1(F_N)_{kn} = \omega_N^{-kn}, \quad k, n = 0, 1, \ldots, N-1

The matrix is fully specified by its single generator ωN\omega_N. Written out explicitly for N=4N=4:

F4=[11111i1i11111i1i]F_4 = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & -i & -1 & i \\ 1 & -1 & 1 & -1 \\ 1 & i & -1 & -i \end{bmatrix}

Unitarity. The DFT matrix satisfies FNFN=NINF_N F_N^* = N I_N, where FNF_N^* is the conjugate transpose. Equivalently, the normalized DFT matrix 1NFN\frac{1}{\sqrt{N}}F_N is unitary:

1NFN(1NFN)=IN\frac{1}{\sqrt{N}}F_N \cdot \left(\frac{1}{\sqrt{N}}F_N\right)^* = I_N

Proof. The (k,l)(k, l) entry of FNFNF_N F_N^* is:

(FNFN)kl=n=0N1ωNknωNln=n=0N1ωN(lk)n={Nif k=l0if kl(F_N F_N^*)_{kl} = \sum_{n=0}^{N-1} \omega_N^{-kn} \overline{\omega_N^{-ln}} = \sum_{n=0}^{N-1} \omega_N^{(l-k)n} = \begin{cases} N & \text{if } k=l \\ 0 & \text{if } k \neq l \end{cases}

by the orthogonality of roots of unity. Therefore FNFN=NIF_N F_N^* = NI, so FN1=1NFNF_N^{-1} = \frac{1}{N}F_N^*, confirming the IDFT formula: x=1NFNX\mathbf{x} = \frac{1}{N}F_N^* \mathbf{X}.

Key structural properties of FNF_N:

  • Vandermonde structure: FNF_N is a Vandermonde matrix with nodes ωN0,ωN1,,ωN(N1)\omega_N^0, \omega_N^{-1}, \ldots, \omega_N^{-(N-1)}
  • Symmetric: (FN)kn=(FN)nk(F_N)_{kn} = (F_N)_{nk} - the matrix is symmetric (not Hermitian)
  • Periodic indexing: all indices are taken modulo NN, making the transform naturally circular

2.4 DFT as a Change of Basis

The columns of FNF_N^* (equivalently, the rows of FNF_N conjugated) are the Fourier basis vectors:

fk=1N[1ωNkωN2kωN(N1)k]CN,k=0,1,,N1\mathbf{f}_k = \frac{1}{\sqrt{N}}\begin{bmatrix} 1 \\ \omega_N^k \\ \omega_N^{2k} \\ \vdots \\ \omega_N^{(N-1)k} \end{bmatrix} \in \mathbb{C}^N, \quad k = 0, 1, \ldots, N-1

These form an orthonormal basis for CN\mathbb{C}^N: fk,fl=δkl\langle \mathbf{f}_k, \mathbf{f}_l \rangle = \delta_{kl}.

The DFT coefficient X[k]/NX[k]/\sqrt{N} is the coordinate of x\mathbf{x} in the direction fk\mathbf{f}_k:

X[k]N=x,fk=1Nn=0N1x[n]ωNnk\frac{X[k]}{\sqrt{N}} = \left\langle \mathbf{x}, \mathbf{f}_k \right\rangle = \frac{1}{\sqrt{N}}\sum_{n=0}^{N-1} x[n]\,\omega_N^{-nk}

Fourier basis as "frequency detector" vectors: fk\mathbf{f}_k oscillates at exactly kk cycles across the NN samples. If x\mathbf{x} is itself a pure sinusoid at frequency k0k_0, then X[k]|X[k]| is zero for all kk0k \neq k_0 and equals NN at k=k0k = k_0. The DFT "detects" which frequencies are present by computing inner products with all NN basis sinusoids simultaneously.

Contrast with DTFT (Discrete-Time Fourier Transform): The DTFT is defined for infinite sequences as X(ejω)=n=x[n]ejωnX(e^{j\omega}) = \sum_{n=-\infty}^\infty x[n]e^{-j\omega n} and produces a continuous spectrum over ω[0,2π)\omega \in [0, 2\pi). The DFT is a sampled version of the DTFT, evaluating it at NN equally-spaced frequencies. The DFT is the only version that is both finite and exactly invertible.

2.5 Relation to the Continuous Fourier Transform

The DFT approximates the continuous Fourier Transform of a sampled signal. Specifically, if x(t)x(t) is bandlimited to [fs/2,fs/2)[-f_s/2, f_s/2) and sampled at rate fsf_s:

X[k]x^ ⁣(kNTs)1Ts1N=1Tsx^(kΔf)X[k] \approx \hat{x}\!\left(\frac{k}{N T_s}\right) \cdot \frac{1}{T_s} \cdot \frac{1}{N} = \frac{1}{T_s}\hat{x}(k\,\Delta f)

where Δf=fs/N\Delta f = f_s/N is the frequency bin spacing. The DFT thus provides a discretized snapshot of the continuous spectrum at NN frequency points.

Three key approximation errors arise:

  1. Aliasing: frequencies above fs/2f_s/2 fold back into [0,fs/2)[0, f_s/2) - see Section 6.3
  2. Leakage: the signal is observed for only NN samples, equivalent to multiplying by a rectangular window, causing sidelobe contamination in the spectrum - see Section 5
  3. Picket fence: the DFT only evaluates the spectrum at NN discrete frequencies, potentially missing peaks between bins - see Section 6.2

3. Properties of the DFT

All DFT properties follow from linearity of the sum and the orthogonality of roots of unity. We state each property, prove it, and note its discrete-specific character where it differs from the continuous FT.

3.1 Linearity

F{ax[n]+by[n]}[k]=aX[k]+bY[k]\mathcal{F}\{ax[n] + by[n]\}[k] = a\,X[k] + b\,Y[k]

Proof. Immediate from linearity of summation: n(ax[n]+by[n])ωNnk=anx[n]ωNnk+bny[n]ωNnk\sum_n (ax[n]+by[n])\omega_N^{-nk} = a\sum_n x[n]\omega_N^{-nk} + b\sum_n y[n]\omega_N^{-nk}.

Linearity makes the DFT a linear operator on CN\mathbb{C}^N, represented by the matrix FNF_N.

3.2 Circular Shift

If y[n]=x[nmmodN]y[n] = x[n - m \bmod N] (shift x\mathbf{x} right by mm positions with wrap-around), then:

Y[k]=ωNmkX[k]Y[k] = \omega_N^{-mk}\, X[k]

Proof.

Y[k]=n=0N1x[nm]ωNnk=l=nml=0N1x[l]ωN(l+m)k=ωNmkl=0N1x[l]ωNlk=ωNmkX[k]Y[k] = \sum_{n=0}^{N-1} x[n-m]\,\omega_N^{-nk} \overset{l=n-m}{=} \sum_{l=0}^{N-1} x[l]\,\omega_N^{-(l+m)k} = \omega_N^{-mk} \sum_{l=0}^{N-1} x[l]\,\omega_N^{-lk} = \omega_N^{-mk} X[k]

Important distinction from continuous case. In the continuous FT, a time shift by τ\tau introduces the factor e2πiξτe^{-2\pi i\xi\tau}. In the DFT, the shift is circular (modular): shifting right pushes the last sample to the first position. If you apply a linear (non-circular) shift to a finite-length signal, the result is NOT simply the DFT multiplied by a phase factor - truncation effects occur. This is a frequent source of confusion when applying the shift property carelessly.

Application. In signal processing, circular shift models delay in a circular buffer. In machine learning, circular convolution (Section 3.6) exploits this property to compute convolutions via FFT.

3.3 Modulation (Frequency Shift)

If y[n]=ωNmnx[n]y[n] = \omega_N^{mn} x[n] (multiply by a complex sinusoid), then:

Y[k]=X[kmmodN]Y[k] = X[k - m \bmod N]

This is the dual of the circular shift property: multiplication in time = shift in frequency.

Application. Frequency-shift keying (FSK) in communications. In transformers, rotary position encoding (RoPE) multiplies embeddings by complex exponentials - this is modulation in the DFT sense, shifting the frequency-domain representation of each token by its position.

3.4 Conjugate Symmetry for Real Inputs

If x[n]Rx[n] \in \mathbb{R} for all nn, then:

X[Nk]=X[k],k=0,1,,N1X[N-k] = X[k]^*, \quad k = 0, 1, \ldots, N-1

Proof. X[Nk]=nx[n]ωNn(Nk)=nx[n]ωNnk=nx[n]ωNnk=X[k]X[N-k] = \sum_n x[n]\omega_N^{-n(N-k)} = \sum_n x[n]\omega_N^{nk} = \sum_n x[n]\overline{\omega_N^{-nk}} = \overline{X[k]}, using x[n]Rx[n] \in \mathbb{R} so x[n]=x[n]x[n] = \overline{x[n]} and ωNnN=1\omega_N^{nN} = 1.

Consequence. Only N/2+1\lfloor N/2 \rfloor + 1 DFT coefficients X[0],X[1],,X[N/2]X[0], X[1], \ldots, X[N/2] are independent; the rest are determined by conjugate symmetry. NumPy's rfft exploits this to return only the non-redundant half, saving memory and computation. For AI applications working with real-valued signals (audio, time-series), always use rfft/irfft.

DC and Nyquist bins: X[0]=nx[n]RX[0] = \sum_n x[n] \in \mathbb{R} (always real, proportional to the mean). For even NN: X[N/2]=nx[n](1)nRX[N/2] = \sum_n x[n](-1)^n \in \mathbb{R} (always real, proportional to the alternating sum).

3.5 Parseval's Theorem for the DFT

n=0N1x[n]2=1Nk=0N1X[k]2\sum_{n=0}^{N-1} |x[n]|^2 = \frac{1}{N} \sum_{k=0}^{N-1} |X[k]|^2

Proof. Since 1NFN\frac{1}{\sqrt{N}}F_N is unitary, it preserves the 2\ell^2 norm:

x2=1NFNx2=1NX2\lVert \mathbf{x} \rVert^2 = \left\lVert \frac{1}{\sqrt{N}}F_N \mathbf{x} \right\rVert^2 = \frac{1}{N}\lVert \mathbf{X} \rVert^2

Interpretation. Energy is perfectly preserved by the DFT - the total energy in the time domain equals the total energy in the frequency domain (up to the 1/N1/N normalization factor). This is the discrete analog of Plancherel's theorem from Section 02.

Application. The power spectral density (PSD) is X[k]2/N|X[k]|^2 / N, which by Parseval sums to the total signal power. In Whisper's mel spectrogram, the log power log(X[k]2)\log(|X[k]|^2) of each FFT bin is computed and then summed over mel filterbanks.

3.6 Circular Convolution - Preview

The most important DFT property is the circular convolution theorem: pointwise multiplication in frequency corresponds to circular convolution in time.

Definition. The circular (cyclic) convolution of x\mathbf{x} and y\mathbf{y} in CN\mathbb{C}^N is:

(xy)[n]=m=0N1x[m]y[nmmodN](x \circledast y)[n] = \sum_{m=0}^{N-1} x[m]\, y[n-m \bmod N]

Circular Convolution Theorem. F{xy}[k]=X[k]Y[k]\mathcal{F}\{x \circledast y\}[k] = X[k] \cdot Y[k].

Preview: This theorem is the entire foundation of FFT-based convolution. The full treatment - including the distinction between circular and linear convolution, overlap-add/save for long signals, filter design, and the connection to CNNs - is the subject of Section 20-04 Convolution Theorem. Here we note the fact; the applications follow in Section 04.

Why circular? The DFT implicitly assumes the signal is periodic with period NN. The product X[k]Y[k]X[k]Y[k] is the DFT of the circular (not linear) convolution of x\mathbf{x} and y\mathbf{y}. To compute linear convolution via FFT, zero-pad both sequences to length N+M1N + M - 1 before taking the DFT (Section 6.2).

3.7 DFT Properties Master Table

PropertyTime domain x[n]x[n]Frequency domain X[k]X[k]
Linearityax[n]+by[n]ax[n] + by[n]aX[k]+bY[k]aX[k] + bY[k]
Circular shiftx[nmmodN]x[n-m \bmod N]ωNmkX[k]\omega_N^{-mk} X[k]
ModulationωNmnx[n]\omega_N^{mn} x[n]X[kmmodN]X[k-m \bmod N]
Conjugate symmetryx[n]Rx[n] \in \mathbb{R}X[Nk]=X[k]X[N-k] = X[k]^*
Time reversalx[nmodN]x[-n \bmod N]X[kmodN]=X[Nk]X[-k \bmod N] = X[N-k]
Conjugationx[n]x[n]^*X[kmodN]=X[Nk]X[-k \bmod N]^* = X[N-k]^*
Circular convolution(xy)[n](x \circledast y)[n]X[k]Y[k]X[k]\cdot Y[k]
Pointwise productx[n]y[n]x[n] \cdot y[n]1N(XY)[k]\frac{1}{N}(X \circledast Y)[k]
Parsevalnx[n]2\sum_n \lvert x[n] \rvert^21NkX[k]2\frac{1}{N}\sum_k \lvert X[k] \rvert^2
DualityX[n]X[n]Nx[kmodN]N\, x[-k \bmod N]

4. The Fast Fourier Transform Algorithm

4.1 Complexity of Naive DFT

Direct evaluation of X[k]=n=0N1x[n]ωNnkX[k] = \sum_{n=0}^{N-1} x[n]\omega_N^{-nk} for all k=0,,N1k = 0, \ldots, N-1 requires:

  • NN complex multiplications per output bin (multiplying each x[n]x[n] by ωNnk\omega_N^{-nk})
  • N1N-1 complex additions per bin
  • Total: N2N^2 complex multiplications, N(N1)N(N-1) complex additions
  • Each complex multiplication = 4 real multiplications + 2 additions

For N=1024N = 1024: about 10610^6 complex multiplications. For N=65536N = 65536: about 4×1094 \times 10^9. At 2010 hardware speeds (~10910^9 operations/second), a 65536-point DFT takes ~4 seconds. Audio applications need hundreds per second. Naive DFT is simply not usable.

4.2 The Cooley-Tukey Insight: Divide and Conquer

Assume N=2mN = 2^m (a power of 2). Split the sum into even-indexed and odd-indexed samples:

X[k]=n=0N1x[n]ωNnk=r=0N/21x[2r]ωN2rkE[k]+r=0N/21x[2r+1]ωN(2r+1)kO[k]X[k] = \sum_{n=0}^{N-1} x[n]\omega_N^{-nk} = \underbrace{\sum_{r=0}^{N/2-1} x[2r]\,\omega_N^{-2rk}}_{E[k]} + \underbrace{\sum_{r=0}^{N/2-1} x[2r+1]\,\omega_N^{-(2r+1)k}}_{O[k]}

Observe that ωN2rk=e2πi(2r)k/N=e2πirk/(N/2)=ωN/2rk\omega_N^{-2rk} = e^{-2\pi i(2r)k/N} = e^{-2\pi irk/(N/2)} = \omega_{N/2}^{-rk}. Therefore:

E[k]=r=0N/21x[2r]ωN/2rk=DFTN/2 ⁣({x[0],x[2],x[4],})[k]E[k] = \sum_{r=0}^{N/2-1} x[2r]\,\omega_{N/2}^{-rk} = \text{DFT}_{N/2}\!\left(\{x[0], x[2], x[4], \ldots\}\right)[k] O[k]=ωNkr=0N/21x[2r+1]ωN/2rk=ωNkDFTN/2 ⁣({x[1],x[3],x[5],})[k]O[k] = \omega_N^{-k}\sum_{r=0}^{N/2-1} x[2r+1]\,\omega_{N/2}^{-rk} = \omega_N^{-k} \cdot \text{DFT}_{N/2}\!\left(\{x[1], x[3], x[5], \ldots\}\right)[k]

Both E[k]E[k] and O[k]O[k] are (N/2)(N/2)-point DFTs, and both are periodic in kk with period N/2N/2. For k=0,,N/21k = 0, \ldots, N/2-1:

X[k]=E[k]+ωNkO[k]\boxed{X[k] = E[k] + \omega_N^{-k} O[k]} X[k+N/2]=E[k]ωNkO[k]\boxed{X[k + N/2] = E[k] - \omega_N^{-k} O[k]}

This is the Cooley-Tukey butterfly: two N/2N/2-point DFTs plus N/2N/2 complex multiplications (by the twiddle factors ωNk\omega_N^{-k}) yield the full NN-point DFT.

Recursion: Apply the same split to each N/2N/2-point DFT, then to each N/4N/4-point DFT, and so on, until we reach 2-point DFTs (which are trivial: X[0]=x[0]+x[1]X[0] = x[0]+x[1], X[1]=x[0]x[1]X[1] = x[0]-x[1]).

4.3 Butterfly Diagram

The Cooley-Tukey recursion is beautifully represented as a signal flow graph called the butterfly diagram. For an 8-point DFT (N=8N=8, log28=3\log_2 8 = 3 stages):

DFT-8 BUTTERFLY DIAGRAM (Decimation in Time)
========================================================================

  Input           Stage 1          Stage 2          Stage 3     Output
  (bit-rev)       N/8 DFTs         N/4 DFTs         N/2 DFTs

  x[0] ----+--------------+---------------+-------- X[0]
            |              |               |
  x[4] ----+W0   ...      |               |
            |              |               |
  x[2] ----+              |W0   ...       |
            |              |               |
  x[6] ----+              |               |
            |              |               |
  x[1] ----+              |               |W0
            |              |               |
  x[5] ----+              |               |W1
            |              |               |
  x[3] ----+              |               |W2
            |              |               |
  x[7] ------------------------------------------ X[7]

  Each node: X[k] = E[k] + W^k * O[k]
             X[k+N/2] = E[k] - W^k * O[k]

  W^k = _N^{-k} = e^{-2ik/N}  (twiddle factor)

========================================================================

Each butterfly is a 2-input, 2-output operation:

  a --+------ a + Wb
      |    
      |     
      |        (butterfly crossing)
      |     
      |    
  b --+-- a - Wb

For N=2mN = 2^m, the diagram has m=log2Nm = \log_2 N stages, each containing N/2N/2 butterflies. Total butterflies: N2log2N\frac{N}{2}\log_2 N.

4.4 Decimation in Time vs Decimation in Frequency

Decimation in Time (DIT): Split the input by even/odd indices. The input must be in bit-reversed order (see Section 4.5), while the output emerges in natural order. This is the form described above, and the most common in practice.

Decimation in Frequency (DIF): Split the output by even/odd frequencies. The input is in natural order, while the output emerges in bit-reversed order. The butterfly computation slightly differs but the complexity is identical.

Both variants require exactly the same number of operations. The choice is often determined by memory access patterns or hardware constraints.

In-place computation: The butterfly computation can be performed in-place - the two outputs overwrite the two inputs without needing extra memory. This makes the FFT extremely memory-efficient for large NN.

4.5 Bit-Reversal Permutation

The DIT-FFT requires the input in bit-reversed order. For N=8N = 8 (3-bit indices):

Natural indexBinaryBit-reversedBit-reversed decimal
00000000
10011004
20100102
30111106
41000011
51011015
61100113
71111117

Why bit-reversal arises: Each stage of the DIT recursion separates samples by their last bit (even vs odd) - what was the last bit of the original index becomes the first bit at the deepest recursion level. After log2N\log_2 N stages of even-odd splitting, the indices are fully bit-reversed.

Efficient computation: Bit-reversal can be computed in O(N)O(N) time using integer bit-reversal operations or a simple loop with bin() in Python. Hardware FFT chips often include dedicated bit-reversal circuits.

4.6 Complexity Analysis: O(N log N)

Let T(N)T(N) be the number of complex multiplications required by the FFT. The recursion is:

T(N)=2T(N/2)+N2T(N) = 2\,T(N/2) + \frac{N}{2}

where N/2N/2 twiddle-factor multiplications are needed to combine two N/2N/2-point DFTs. With T(1)=0T(1) = 0:

Solving the recurrence:

T(N)=2T(N/2)+N/2T(N) = 2T(N/2) + N/2 =2[2T(N/4)+N/4]+N/2=4T(N/4)+N/2+N/2= 2\bigl[2T(N/4) + N/4\bigr] + N/2 = 4T(N/4) + N/2 + N/2 =4[2T(N/8)+N/8]+N=8T(N/8)+3N/2= 4\bigl[2T(N/8) + N/8\bigr] + N = 8T(N/8) + 3N/2 \vdots =NT(1)+N2log2N=N2log2N= N \cdot T(1) + \frac{N}{2}\log_2 N = \frac{N}{2}\log_2 N

Total operations: N2log2N\frac{N}{2}\log_2 N complex multiplications and Nlog2NN\log_2 N complex additions. This is O(NlogN)O(N \log N).

Comparison with naive DFT (N2N^2 multiplications):

NNFFT / DFT ratio
643/644.7%3 / 64 \approx 4.7\%
1,0245/10240.49%5 / 1024 \approx 0.49\%
10610^60.002%\approx 0.002\%

For large NN, the FFT uses less than 0.01% of the operations required by the naive DFT.

Empirical validation: The O(NlogN)O(N \log N) curve can be confirmed by timing numpy.fft.fft for N=2kN = 2^k, k=6,,20k = 6, \ldots, 20 - the log-log plot should have slope 1\approx 1 (since NlogNN1N \log N \approx N^1 for moderate NN, growing only slowly faster than linear).

4.7 Variants and Extensions

Mixed-radix FFT. The pure radix-2 algorithm requires NN to be a power of 2. Real-world signals rarely have power-of-2 lengths. Mixed-radix algorithms factor N=N1×N2××NrN = N_1 \times N_2 \times \cdots \times N_r into small prime factors and apply a generalized Cooley-Tukey recursion. With N=N1N2N = N_1 N_2, the DFT can be decomposed into N1N_1 DFTs of size N2N_2 and N2N_2 DFTs of size N1N_1. Common radices: 2, 3, 4, 5, 7, 8.

Split-radix FFT. Combines radix-2 and radix-4 butterflies to achieve the minimum theoretical operation count: approximately 13Nlog2N\frac{1}{3}N\log_2 N complex multiplications. The split-radix algorithm is the most operation-efficient known for power-of-2 lengths.

Prime-length algorithms:

  • Rader's algorithm (1968): Converts a prime-length DFT into a circular convolution of composite length, solvable by FFT.
  • Bluestein's algorithm (1970): Converts any DFT into a convolution via the identity nk=(n+k2)(n2)(k2)nk = \binom{n+k}{2} - \binom{n}{2} - \binom{k}{2}, enabling FFT-based computation for any NN.

Real-valued FFT (rfft). For real inputs, conjugate symmetry (Section 3.4) means only N/2+1N/2+1 output values are independent. np.fft.rfft computes only these, roughly halving computation and memory.

FFTW (Fastest Fourier Transform in the West). FFTW (Frigo & Johnson, 2005) automatically selects the best algorithm for any NN and hardware through a "plan" computed at initialization time. It achieves near-optimal performance across a wide range of NN and machine architectures. All major scientific computing libraries (NumPy, SciPy, MATLAB, Julia) use FFTW or FFTW-compatible libraries under the hood.

For AI: PyTorch's torch.fft.fft uses CUDA-accelerated FFT (cuFFT) on GPU. For sequence lengths that are not powers of 2, cuFFT uses mixed-radix algorithms. When tuning sequence lengths for FFT-based models (FNet, FNO, FlashFFTConv), choosing N=2kN = 2^k or N=2a3b5cN = 2^a \cdot 3^b \cdot 5^c avoids prime-length slowdowns.


5. Spectral Leakage and Windowing

5.1 The Leakage Problem

The DFT assumes that the NN samples represent one complete period of a periodic signal. In reality, we observe a finite segment of a continuous signal - we multiply by a rectangular window that is 1 inside [0,N1][0, N-1] and 0 outside. In frequency, multiplication by a rectangular window corresponds to convolution with the DFT of the rectangle - the Dirichlet kernel:

Wrect[k]=n=0N1ωNnk={Nk=01ωNNk1ωNk=sin(πk)sin(πk/N)eiπk(N1)/Nk0W_{\text{rect}}[k] = \sum_{n=0}^{N-1} \omega_N^{-nk} = \begin{cases} N & k=0 \\ \frac{1-\omega_N^{-Nk}}{1-\omega_N^{-k}} = \frac{\sin(\pi k)}{\sin(\pi k/N)}e^{-i\pi k(N-1)/N} & k \neq 0 \end{cases}

For non-integer frequencies (frequencies that do not land exactly on a DFT bin), the Dirichlet kernel's large sidelobes spread energy from the true frequency into adjacent bins - this is spectral leakage.

Example. Suppose the signal is x(t)=cos(2πf0t)x(t) = \cos(2\pi f_0 t) with f0=3.5fs/Nf_0 = 3.5 f_s/N (a non-integer number of cycles in the window). The true spectrum has just two peaks at ±f0\pm f_0. The DFT shows energy spread across all NN bins, with the main peak at the nearest bin (k=3k=3 or k=4k=4) and significant sidelobe contamination at all other bins.

Consequence for AI: In Whisper's mel spectrogram, a sharp onset (e.g., a consonant burst) at a non-bin frequency would appear smeared across multiple mel filterbanks. Window choice directly affects how well transient events are separated in the spectrogram.

5.2 Window Functions

A window function w[n]w[n] tapers the signal to zero at its endpoints, reducing the discontinuity that causes leakage. The windowed DFT is:

Xw[k]=n=0N1w[n]x[n]ωNnkX_w[k] = \sum_{n=0}^{N-1} w[n]\, x[n]\, \omega_N^{-nk}

The spectrum XwX_w is the DFT of w[n]x[n]w[n]x[n] - by the product property (Section 3.7), this equals the circular convolution of XX and WW (normalized by 1/N1/N). Windows with smaller sidelobes produce less leakage at the cost of a wider main lobe (reduced frequency resolution).

Standard window functions (all defined for n=0,1,,N1n = 0, 1, \ldots, N-1):

Rectangular (Boxcar):

w[n]=1w[n] = 1
  • Main lobe width: 4π/N4\pi/N (narrowest)
  • Peak sidelobe: 13dB-13\,\mathrm{dB} (highest leakage)
  • Use case: when the signal exactly fits NN samples; spectral analysis of periodic signals at bin frequencies

Hann (Hanning):

w[n]=12(1cos ⁣2πnN1)w[n] = \frac{1}{2}\left(1 - \cos\!\frac{2\pi n}{N-1}\right)
  • Main lobe width: 8π/N8\pi/N
  • Peak sidelobe: 31.5dB-31.5\,\mathrm{dB}
  • Use case: default choice for general spectral analysis; used in Whisper STFT

Hamming:

w[n]=0.540.46cos ⁣2πnN1w[n] = 0.54 - 0.46\cos\!\frac{2\pi n}{N-1}
  • Main lobe width: 8π/N8\pi/N
  • Peak sidelobe: 42.7dB-42.7\,\mathrm{dB}
  • Use case: narrowband applications requiring low sidelobes; slightly wider main lobe than Hann

Blackman:

w[n]=0.420.5cos ⁣2πnN1+0.08cos ⁣4πnN1w[n] = 0.42 - 0.5\cos\!\frac{2\pi n}{N-1} + 0.08\cos\!\frac{4\pi n}{N-1}
  • Main lobe width: 12π/N12\pi/N
  • Peak sidelobe: 58dB-58\,\mathrm{dB}
  • Use case: high-dynamic-range spectral analysis

Kaiser:

w[n]=I0 ⁣(β1(2n/(N1)1)2)I0(β)w[n] = \frac{I_0\!\left(\beta\sqrt{1-(2n/(N-1)-1)^2}\right)}{I_0(\beta)}

where I0I_0 is the zeroth-order modified Bessel function and β\beta is a shape parameter.

  • β=0\beta = 0: rectangular; β5\beta \approx 5: similar to Hamming; β8.6\beta \approx 8.6: similar to Blackman
  • Chebyshev-optimal: for a given main-lobe width, the Kaiser window minimizes peak sidelobe level
  • Use case: when precise sidelobe control is required; used in scientific spectral analysis and filter design

5.3 The Main-Lobe/Side-Lobe Tradeoff

Every window function must satisfy a fundamental constraint: the product of main-lobe width (in frequency bins) and side-lobe level (in dB) cannot be simultaneously minimized. This is a consequence of the uncertainty principle applied to finite sequences.

WindowMain-lobe width (bins)Peak sidelobe (dB)Sidelobe rolloff (dB/oct)
Rectangular213-136-6
Hann431.5-31.518-18
Hamming442.7-42.76-6
Blackman658-5818-18
Blackman-Harris892-926-6
Kaiser (β=8.6\beta=8.6)880-806-6

Coherent gain: the window amplitude gain, nw[n]/N\sum_n w[n]/N. Windows with lower sidelobes also have lower coherent gain (the main peak is smaller), which must be corrected by dividing by the coherent gain when measuring amplitudes.

Equivalent noise bandwidth (ENBW): the width of a rectangular window that would pass the same noise power. Hann has ENBW =1.5= 1.5 bins - it is effectively 1.5 times wider than the rectangular window for noise measurements.

5.4 Scalloping Loss and the Picket Fence Effect

The DFT evaluates the spectrum at exactly NN uniformly-spaced bins. If a sinusoid's true frequency f0f_0 falls exactly halfway between two bins (at k+0.5k + 0.5 in bin units), then both neighboring bins are equally attenuated. The worst-case attenuation - the scalloping loss - depends on the window:

  • Rectangular: 3.9dB-3.9\,\mathrm{dB} (worst case for nearest-bin amplitude estimate)
  • Hann: 1.4dB-1.4\,\mathrm{dB}
  • Blackman: 0.8dB-0.8\,\mathrm{dB}

This is the picket fence effect: the DFT spectrum looks like a picket fence - it shows signal energy at the posts (bins) but may miss peaks in the gaps between posts.

Mitigation strategies:

  1. Zero-padding (Section 6.2): append zeros to increase NN without changing the observation window, interpolating the spectrum between bins. This does NOT improve frequency resolution (the bandwidth 1/T1/T is unchanged) but allows finer frequency estimation.

  2. Parabolic interpolation: fit a parabola to the peak bin and its two neighbors; the parabola's maximum is a better estimate of the true frequency.

  3. Quinn's estimator and Jacobsen's estimator: closed-form formulas giving sub-bin frequency estimates with low computational cost.

5.5 Overlap-Add and Overlap-Save

For long signals (audio tracks, continuous data streams), we cannot apply a single DFT to the entire signal. We must process it in blocks. Two standard methods handle the boundary effects at block edges:

Overlap-Add (OLA):

  1. Segment the input into blocks of length MM
  2. Window each block with w[n]w[n] (length MM)
  3. Zero-pad each block to length NM+L1N \geq M + L - 1 (where LL is the filter length)
  4. Compute DFT-based processing on each block
  5. Overlap and add the outputs (adjacent outputs share L1L-1 samples)

Overlap-Save (OLS):

  1. Segment the input into blocks of length NN with overlap of L1L-1 samples
  2. DFT-based processing on each block (no zero-padding needed)
  3. Discard the first L1L-1 samples of each output block (the "contaminated" samples)
  4. Concatenate the remaining NL+1N - L + 1 samples

Both methods allow exact (non-approximative) processing of long signals with FFT-based operations. The COLA (Constant Overlap-Add) condition for perfect reconstruction is:

m=w[nmH]=Cn\sum_{m=-\infty}^{\infty} w[n - mH] = C \quad \forall n

where HH is the hop size. Hann windows with 50% overlap (H=N/2H = N/2) satisfy COLA, making them the standard for STFT-based audio processing.


6. Frequency Resolution and Zero-Padding

6.1 Frequency Resolution

The frequency resolution of the DFT is:

Δf=fsN=1T\Delta f = \frac{f_s}{N} = \frac{1}{T}

where T=N/fsT = N/f_s is the total observation duration. This fundamental limit says: to distinguish two sinusoids at frequencies f1f_1 and f2f_2, you need to observe the signal for at least T=1/f1f2T = 1/|f_1 - f_2| seconds. No amount of zero-padding or upsampling can circumvent this - it is a direct consequence of the uncertainty principle from Section 02.

The time-bandwidth product is fixed: TΔf=1T \cdot \Delta f = 1. You cannot simultaneously have:

  • Short observation window (good for tracking rapid changes)
  • Fine frequency resolution (good for distinguishing close frequencies)

For AI: Whisper uses a 25 ms analysis window (400 samples at 16 kHz) with 10 ms hop size, giving Δf=1/0.025=40\Delta f = 1/0.025 = 40 Hz frequency resolution. This is a deliberate engineering choice: fine enough to separate speech formants (typically 100-300 Hz apart) while short enough to track fast phoneme transitions.

6.2 Zero-Padding

Zero-padding: appending MNM - N zeros to a signal of length NN before computing an MM-point DFT (M>NM > N).

What zero-padding does: it computes MM equally-spaced samples of the continuous DTFT of the original NN-point signal (i.e., the DTFT sampled at MM bins instead of NN). This is frequency-domain interpolation - it fills in points between the original NN bins.

What zero-padding does NOT do: improve frequency resolution. The DTFT itself has resolution limited by 1/N1/N (the original number of samples). Zero-padding only reveals the DTFT more finely; it cannot resolve frequencies closer than 1/N1/N apart.

Applications of zero-padding:

  1. Sub-bin frequency estimation (combine with interpolation): zero-padding by factor of 4-8 allows visual identification of spectral peaks with sub-bin accuracy.

  2. Linear convolution via FFT: to convolve signals of lengths L1L_1 and L2L_2 linearly (not circularly), zero-pad both to length L1+L21\geq L_1 + L_2 - 1 before DFT multiplication (Section 3.6). The circular convolution of the zero-padded sequences equals the linear convolution of the originals.

  3. Next power-of-2: zero-pad to the nearest power of 2 to maximize FFT efficiency, e.g., 1000 -> 1024.

For AI: The Fourier Neural Operator (Section 8.2) explicitly truncates the spectrum to the lowest KK frequencies before learning, implicitly treating the signal as if zero-padded beyond the training domain. FlashFFTConv similarly uses zero-padding to convert long circular convolutions to linear ones.

6.3 The Nyquist-Shannon Sampling Theorem

Theorem (Shannon, 1949; Nyquist, 1928; Whittaker, 1915). If a continuous signal x(t)x(t) is bandlimited - meaning x^(ξ)=0\hat{x}(\xi) = 0 for ξ>B|\xi| > B - then x(t)x(t) can be perfectly reconstructed from samples at rate fs2Bf_s \geq 2B.

Why the DFT cares: if fs<2Bf_s < 2B, frequencies above fs/2f_s/2 alias onto lower frequencies. Specifically, a sinusoid at frequency f0>fs/2f_0 > f_s/2 produces the same sample sequence as a sinusoid at frequency fsf0<fs/2f_s - f_0 < f_s/2. In the DFT, bin kk and bin NkN-k represent the same physical frequency if fsf_s is not high enough.

The aliasing formula: a signal at frequency f0f_0 aliases to falias=f0round(f0/fs)fsf_{\text{alias}} = |f_0 - \text{round}(f_0/f_s) \cdot f_s|, i.e., the nearest copy within [0,fs/2)[0, f_s/2).

Connection to Poisson summation (from Section 02-7.5): the spectrum of the sampled signal is the sum of periodically-shifted copies of the continuous spectrum:

Xs(ξ)=fsk=x^(ξkfs)X_s(\xi) = f_s \sum_{k=-\infty}^{\infty} \hat{x}(\xi - kf_s)

If adjacent copies overlap (i.e., fs<2Bf_s < 2B), they contaminate each other - this is aliasing. If fs2Bf_s \geq 2B, the copies are non-overlapping and the original spectrum can be recovered by applying a lowpass filter with cutoff at fs/2f_s/2.

Practical rule: In practice, use fs2.5Bf_s \geq 2.5B with an anti-aliasing lowpass filter at fs/2f_s/2 before the ADC. Audio CDs use fs=44.1f_s = 44.1 kHz because human hearing extends to 20\sim 20 kHz (requires fs>40f_s > 40 kHz), with the extra 2.1 kHz providing a transition band for the anti-aliasing filter.

6.4 Practical FFT Setup Checklist

When computing an FFT-based spectral analysis, follow this sequence:

PRACTICAL FFT CHECKLIST
========================================================================

  1. CHOOSE N
     - Prefer N = 2^k (fastest FFT)
     - If not possible: N = 2^a * 3^b * 5^c (still fast)
     - N controls frequency resolution: f = fs / N

  2. APPLY WINDOW
     - Default: Hann window (good sidelobes, modest resolution loss)
     - High dynamic range: Blackman or Kaiser ( approx 8)
     - If signal is periodic with exactly N samples: rectangular

  3. COMPUTE FFT
     - Real input: use rfft (returns N/2+1 values)
     - Complex input: use fft (returns N values)

  4. BUILD FREQUENCY AXIS
     - For rfft: freqs = np.fft.rfftfreq(N, d=1/fs)   [0, fs/2]
     - For fft:  freqs = np.fft.fftfreq(N, d=1/fs)    [0, fs/2, -fs/2, 0]
     - After fftshift: freqs = [-fs/2, ..., 0, ..., fs/2]

  5. INTERPRET MAGNITUDE
     - |X[k]| / N  ->  amplitude of the sinusoid
     - |X[k]|^2 / N  ->  power spectral density
     - Divide by window coherent gain for absolute amplitudes

  6. HANDLE NEGATIVE FREQUENCIES
     - Use fftshift(X) to center zero-frequency
     - For real inputs: mirror the positive half (rfft returns one-sided)

========================================================================

Skill Check

Test this lesson

Answer 4 quick questions to lock in the lesson and feed your adaptive practice queue.

--
Score
0/4
Answered
Not attempted
Status
1

Which module does this lesson belong to?

2

Which section is covered in this lesson content?

3

Which term is most central to this lesson?

4

What is the best way to use this lesson for real learning?

Your answers save locally first, then sync when account storage is available.
Practice queue