Lesson overview | Lesson overview | Next part
Discrete Fourier Transform and FFT: Part 1: Intuition to 6. Frequency Resolution and Zero-Padding
1. Intuition
1.1 From Continuous to Discrete: Sampling a Spectrum
Every practical computation with signals must at some point confront a fundamental constraint: computers cannot store or process continuous functions. What they can store is a finite sequence of numbers - samples of a signal taken at discrete time instants. The passage from the continuous world of the Fourier Transform to the discrete world of the DFT is therefore not a mathematical nicety but a computational necessity.
Suppose we record samples of a signal at a sampling rate of samples per second:
The total observation time is . What frequencies can we see in this finite record? Two constraints emerge immediately:
- Highest observable frequency (Nyquist): any frequency above will be indistinguishable from a lower frequency due to aliasing - two distinct sinusoids will produce identical sample sequences.
- Frequency resolution: the smallest frequency difference we can distinguish is - the reciprocal of the observation duration.
The DFT honours both constraints. It examines the signal at exactly equally-spaced frequency bins:
covering one full period of the periodic spectrum. The first bins correspond to positive frequencies ; the remaining bins correspond to negative frequencies wrapped around to . This wrapping is the discrete manifestation of the Nyquist limit.
For AI: Every neural network that processes time-series data - speech, ECG, seismic, financial - ultimately works with sampled signals. Understanding the frequency axis setup of the DFT is essential for interpreting what spectral features (mel filterbanks in Whisper, power spectral density in EEG classifiers) actually represent. Misinterpreting negative-frequency bins is a common source of subtle bugs in spectrogram preprocessing.
1.2 What the DFT Does Geometrically
The space of -point complex sequences has dimension . The DFT provides a complete orthogonal basis for this space - the Fourier basis - consisting of complex sinusoids at the frequencies :
Each basis vector is a sampled complex sinusoid at frequency cycles per sample. The DFT coefficient is the inner product of the signal with the -th basis vector:
The DFT is therefore a change of basis from the standard basis (time domain) to the Fourier basis (frequency domain). The magnitude measures how strongly the signal oscillates at frequency cycles per sample; the phase measures the phase of that oscillation.
The inverse DFT reconstructs the signal by superimposing all sinusoids with their computed amplitudes and phases:
The factor comes from the normalization of the basis - the have squared norm , not 1. Using the orthonormal basis , both forward and inverse transforms carry a factor. Different software packages use different conventions; always check.
Geometric picture: if you think of the signal as a vector in , the DFT rotates this vector into a new coordinate system. The new coordinate axes are complex sinusoids. The DFT matrix is the rotation matrix (more precisely, a scaled unitary matrix). The operation is a matrix-vector product; naive evaluation takes operations. The FFT computes this same product in .
1.3 Why O(N log N) Changed the World
To appreciate the scale of the FFT's impact, consider the arithmetic:
| Naive DFT () | FFT () | Speedup | |
|---|---|---|---|
| 64 | 4,096 | 384 | 11 |
| 1,024 | 1,048,576 | 10,240 | 102 |
| 65,536 | 4,369 | ||
| 50,000 |
Before the FFT, computing the spectrum of a 1-second audio clip sampled at 44.1 kHz was infeasible in real time. After the FFT, it takes microseconds. The entire modern infrastructure of digital audio, wireless communications (OFDM), radar, MRI, seismology, and antenna design rests on this algorithmic breakthrough.
The secret is that the DFT matrix has enormous structure: it is a Vandermonde matrix with entries that are powers of -th roots of unity, and these entries satisfy recursive relationships that allow massive cancellation of work through a divide-and-conquer decomposition.
For AI (2026): Long-context language models face a quadratic bottleneck in self-attention: processing a 100K-token context with full attention requires operations. The FFT offers an alternative: if a sequence operation can be formulated as a convolution, it can be computed in using the FFT. This is the foundation of FNet (Lee-Thorp et al., 2022), FlashFFTConv (Dao et al., 2023), S4 (Gu et al., 2022), and the Fourier Neural Operator (Li et al., 2021). The vs complexity gap, already dramatic at , becomes existential at .
1.4 Historical Timeline
The story of the FFT is a case study in mathematical rediscovery and the transformative effect of the right algorithm at the right moment.
1805 - Carl Friedrich Gauss derived what we now recognize as the FFT algorithm while computing asteroid orbits. His manuscript lay unpublished and largely unknown.
1903 - Carl Runge independently derived a related fast algorithm for trigonometric interpolation.
1942 - Danielson and Lanczos published a recursive factorization of the DFT based on even-odd splitting, establishing the core mathematical structure.
1965 - James Cooley and John Tukey published "An Algorithm for the Machine Calculation of Complex Fourier Series," the paper that launched the digital signal processing revolution. Their algorithm was motivated by Tukey's desire to detect Soviet nuclear tests through seismic monitoring. The paper, just two pages, is one of the most cited in the history of applied mathematics.
1966 - Gentleman and Sande introduced the "decimation in frequency" variant; Bergland gave a comprehensive tutorial; the algorithm became standard in every engineering discipline within a decade.
1984 - FFTW - Frigo and Johnson developed FFTW ("Fastest Fourier Transform in the West"), an adaptive library that benchmarks and selects the optimal algorithm for any and hardware. FFTW ships in NumPy, SciPy, and is the backbone of essentially all scientific computing FFT pipelines.
2021 - Fourier Neural Operator (Li et al.) applies the DFT inside a neural network layer to learn solution operators for PDEs in .
2022 - Monarch matrices (Dao et al.) parameterize FFT-like structured transforms as products of sparse butterfly matrices, enabling learnable near-FFT computations in transformer architectures.
2. Formal Definitions
2.1 The N-Point DFT
Definition (DFT). Let be a finite sequence. The -point Discrete Fourier Transform of is the sequence defined by:
where is the primitive -th root of unity.
The complex number is a sampled complex sinusoid at frequency cycles per sample, evaluated at time . The DFT coefficient measures the complex amplitude (magnitude and phase) of the frequency- component of .
Sign convention. The minus sign in is the physics/engineering convention (analysis kernel ). Some mathematical texts use (positive sign) for the forward transform; this merely relabels forward and inverse. We follow NumPy/SciPy: forward DFT has the sign.
Roots of unity. The values are the -th roots of unity - equally spaced points on the unit circle in , starting at 1 and rotating counterclockwise by at each step. They satisfy:
This orthogonality of roots of unity is the key identity underlying all DFT properties and the inversion formula.
Standard examples for small :
: . The 2-point DFT is:
This is the Hadamard/Walsh transform in 2D - just a sum and difference.
: . The 4-point DFT:
Non-example: An infinite sequence for does NOT have a DFT - you must first window or truncate it to points. Applying the DFT directly to an infinite sequence is a category error; what you would compute is the -transform or DTFT (Discrete-Time Fourier Transform), which is distinct from the DFT.
2.2 Inverse DFT and Normalization Conventions
Definition (IDFT). The Inverse Discrete Fourier Transform is:
Verification. Substitute the DFT into the IDFT:
using the orthogonality of roots of unity in the penultimate step.
Three normalization conventions (all valid, software-dependent):
| Convention | Forward DFT | Inverse DFT | Used by |
|---|---|---|---|
| Engineering (default) | NumPy, SciPy, MATLAB | ||
| Symmetric (unitary) | Some textbooks | ||
| Inverse-normalized | Rare; avoid |
For AI: Always check which convention a library uses. NumPy uses engineering convention: np.fft.fft has no factor, np.fft.ifft has . PyTorch's torch.fft.fft matches NumPy. When mixing libraries or implementing from scratch, convention mismatch is a common silent bug.
2.3 The DFT Matrix
The DFT is a linear map , where the DFT matrix has entries:
The matrix is fully specified by its single generator . Written out explicitly for :
Unitarity. The DFT matrix satisfies , where is the conjugate transpose. Equivalently, the normalized DFT matrix is unitary:
Proof. The entry of is:
by the orthogonality of roots of unity. Therefore , so , confirming the IDFT formula: .
Key structural properties of :
- Vandermonde structure: is a Vandermonde matrix with nodes
- Symmetric: - the matrix is symmetric (not Hermitian)
- Periodic indexing: all indices are taken modulo , making the transform naturally circular
2.4 DFT as a Change of Basis
The columns of (equivalently, the rows of conjugated) are the Fourier basis vectors:
These form an orthonormal basis for : .
The DFT coefficient is the coordinate of in the direction :
Fourier basis as "frequency detector" vectors: oscillates at exactly cycles across the samples. If is itself a pure sinusoid at frequency , then is zero for all and equals at . The DFT "detects" which frequencies are present by computing inner products with all basis sinusoids simultaneously.
Contrast with DTFT (Discrete-Time Fourier Transform): The DTFT is defined for infinite sequences as and produces a continuous spectrum over . The DFT is a sampled version of the DTFT, evaluating it at equally-spaced frequencies. The DFT is the only version that is both finite and exactly invertible.
2.5 Relation to the Continuous Fourier Transform
The DFT approximates the continuous Fourier Transform of a sampled signal. Specifically, if is bandlimited to and sampled at rate :
where is the frequency bin spacing. The DFT thus provides a discretized snapshot of the continuous spectrum at frequency points.
Three key approximation errors arise:
- Aliasing: frequencies above fold back into - see Section 6.3
- Leakage: the signal is observed for only samples, equivalent to multiplying by a rectangular window, causing sidelobe contamination in the spectrum - see Section 5
- Picket fence: the DFT only evaluates the spectrum at discrete frequencies, potentially missing peaks between bins - see Section 6.2
3. Properties of the DFT
All DFT properties follow from linearity of the sum and the orthogonality of roots of unity. We state each property, prove it, and note its discrete-specific character where it differs from the continuous FT.
3.1 Linearity
Proof. Immediate from linearity of summation: .
Linearity makes the DFT a linear operator on , represented by the matrix .
3.2 Circular Shift
If (shift right by positions with wrap-around), then:
Proof.
Important distinction from continuous case. In the continuous FT, a time shift by introduces the factor . In the DFT, the shift is circular (modular): shifting right pushes the last sample to the first position. If you apply a linear (non-circular) shift to a finite-length signal, the result is NOT simply the DFT multiplied by a phase factor - truncation effects occur. This is a frequent source of confusion when applying the shift property carelessly.
Application. In signal processing, circular shift models delay in a circular buffer. In machine learning, circular convolution (Section 3.6) exploits this property to compute convolutions via FFT.
3.3 Modulation (Frequency Shift)
If (multiply by a complex sinusoid), then:
This is the dual of the circular shift property: multiplication in time = shift in frequency.
Application. Frequency-shift keying (FSK) in communications. In transformers, rotary position encoding (RoPE) multiplies embeddings by complex exponentials - this is modulation in the DFT sense, shifting the frequency-domain representation of each token by its position.
3.4 Conjugate Symmetry for Real Inputs
If for all , then:
Proof. , using so and .
Consequence. Only DFT coefficients are independent; the rest are determined by conjugate symmetry. NumPy's rfft exploits this to return only the non-redundant half, saving memory and computation. For AI applications working with real-valued signals (audio, time-series), always use rfft/irfft.
DC and Nyquist bins: (always real, proportional to the mean). For even : (always real, proportional to the alternating sum).
3.5 Parseval's Theorem for the DFT
Proof. Since is unitary, it preserves the norm:
Interpretation. Energy is perfectly preserved by the DFT - the total energy in the time domain equals the total energy in the frequency domain (up to the normalization factor). This is the discrete analog of Plancherel's theorem from Section 02.
Application. The power spectral density (PSD) is , which by Parseval sums to the total signal power. In Whisper's mel spectrogram, the log power of each FFT bin is computed and then summed over mel filterbanks.
3.6 Circular Convolution - Preview
The most important DFT property is the circular convolution theorem: pointwise multiplication in frequency corresponds to circular convolution in time.
Definition. The circular (cyclic) convolution of and in is:
Circular Convolution Theorem. .
Preview: This theorem is the entire foundation of FFT-based convolution. The full treatment - including the distinction between circular and linear convolution, overlap-add/save for long signals, filter design, and the connection to CNNs - is the subject of Section 20-04 Convolution Theorem. Here we note the fact; the applications follow in Section 04.
Why circular? The DFT implicitly assumes the signal is periodic with period . The product is the DFT of the circular (not linear) convolution of and . To compute linear convolution via FFT, zero-pad both sequences to length before taking the DFT (Section 6.2).
3.7 DFT Properties Master Table
| Property | Time domain | Frequency domain |
|---|---|---|
| Linearity | ||
| Circular shift | ||
| Modulation | ||
| Conjugate symmetry | ||
| Time reversal | ||
| Conjugation | ||
| Circular convolution | ||
| Pointwise product | ||
| Parseval | ||
| Duality |
4. The Fast Fourier Transform Algorithm
4.1 Complexity of Naive DFT
Direct evaluation of for all requires:
- complex multiplications per output bin (multiplying each by )
- complex additions per bin
- Total: complex multiplications, complex additions
- Each complex multiplication = 4 real multiplications + 2 additions
For : about complex multiplications. For : about . At 2010 hardware speeds (~ operations/second), a 65536-point DFT takes ~4 seconds. Audio applications need hundreds per second. Naive DFT is simply not usable.
4.2 The Cooley-Tukey Insight: Divide and Conquer
Assume (a power of 2). Split the sum into even-indexed and odd-indexed samples:
Observe that . Therefore:
Both and are -point DFTs, and both are periodic in with period . For :
This is the Cooley-Tukey butterfly: two -point DFTs plus complex multiplications (by the twiddle factors ) yield the full -point DFT.
Recursion: Apply the same split to each -point DFT, then to each -point DFT, and so on, until we reach 2-point DFTs (which are trivial: , ).
4.3 Butterfly Diagram
The Cooley-Tukey recursion is beautifully represented as a signal flow graph called the butterfly diagram. For an 8-point DFT (, stages):
DFT-8 BUTTERFLY DIAGRAM (Decimation in Time)
========================================================================
Input Stage 1 Stage 2 Stage 3 Output
(bit-rev) N/8 DFTs N/4 DFTs N/2 DFTs
x[0] ----+--------------+---------------+-------- X[0]
| | |
x[4] ----+W0 ... | |
| | |
x[2] ----+ |W0 ... |
| | |
x[6] ----+ | |
| | |
x[1] ----+ | |W0
| | |
x[5] ----+ | |W1
| | |
x[3] ----+ | |W2
| | |
x[7] ------------------------------------------ X[7]
Each node: X[k] = E[k] + W^k * O[k]
X[k+N/2] = E[k] - W^k * O[k]
W^k = _N^{-k} = e^{-2ik/N} (twiddle factor)
========================================================================
Each butterfly is a 2-input, 2-output operation:
a --+------ a + Wb
|
|
| (butterfly crossing)
|
|
b --+-- a - Wb
For , the diagram has stages, each containing butterflies. Total butterflies: .
4.4 Decimation in Time vs Decimation in Frequency
Decimation in Time (DIT): Split the input by even/odd indices. The input must be in bit-reversed order (see Section 4.5), while the output emerges in natural order. This is the form described above, and the most common in practice.
Decimation in Frequency (DIF): Split the output by even/odd frequencies. The input is in natural order, while the output emerges in bit-reversed order. The butterfly computation slightly differs but the complexity is identical.
Both variants require exactly the same number of operations. The choice is often determined by memory access patterns or hardware constraints.
In-place computation: The butterfly computation can be performed in-place - the two outputs overwrite the two inputs without needing extra memory. This makes the FFT extremely memory-efficient for large .
4.5 Bit-Reversal Permutation
The DIT-FFT requires the input in bit-reversed order. For (3-bit indices):
| Natural index | Binary | Bit-reversed | Bit-reversed decimal |
|---|---|---|---|
| 0 | 000 | 000 | 0 |
| 1 | 001 | 100 | 4 |
| 2 | 010 | 010 | 2 |
| 3 | 011 | 110 | 6 |
| 4 | 100 | 001 | 1 |
| 5 | 101 | 101 | 5 |
| 6 | 110 | 011 | 3 |
| 7 | 111 | 111 | 7 |
Why bit-reversal arises: Each stage of the DIT recursion separates samples by their last bit (even vs odd) - what was the last bit of the original index becomes the first bit at the deepest recursion level. After stages of even-odd splitting, the indices are fully bit-reversed.
Efficient computation: Bit-reversal can be computed in time using integer bit-reversal operations or a simple loop with bin() in Python. Hardware FFT chips often include dedicated bit-reversal circuits.
4.6 Complexity Analysis: O(N log N)
Let be the number of complex multiplications required by the FFT. The recursion is:
where twiddle-factor multiplications are needed to combine two -point DFTs. With :
Solving the recurrence:
Total operations: complex multiplications and complex additions. This is .
Comparison with naive DFT ( multiplications):
| FFT / DFT ratio | |
|---|---|
| 64 | |
| 1,024 | |
For large , the FFT uses less than 0.01% of the operations required by the naive DFT.
Empirical validation: The curve can be confirmed by timing numpy.fft.fft for , - the log-log plot should have slope (since for moderate , growing only slowly faster than linear).
4.7 Variants and Extensions
Mixed-radix FFT. The pure radix-2 algorithm requires to be a power of 2. Real-world signals rarely have power-of-2 lengths. Mixed-radix algorithms factor into small prime factors and apply a generalized Cooley-Tukey recursion. With , the DFT can be decomposed into DFTs of size and DFTs of size . Common radices: 2, 3, 4, 5, 7, 8.
Split-radix FFT. Combines radix-2 and radix-4 butterflies to achieve the minimum theoretical operation count: approximately complex multiplications. The split-radix algorithm is the most operation-efficient known for power-of-2 lengths.
Prime-length algorithms:
- Rader's algorithm (1968): Converts a prime-length DFT into a circular convolution of composite length, solvable by FFT.
- Bluestein's algorithm (1970): Converts any DFT into a convolution via the identity , enabling FFT-based computation for any .
Real-valued FFT (rfft). For real inputs, conjugate symmetry (Section 3.4) means only output values are independent. np.fft.rfft computes only these, roughly halving computation and memory.
FFTW (Fastest Fourier Transform in the West). FFTW (Frigo & Johnson, 2005) automatically selects the best algorithm for any and hardware through a "plan" computed at initialization time. It achieves near-optimal performance across a wide range of and machine architectures. All major scientific computing libraries (NumPy, SciPy, MATLAB, Julia) use FFTW or FFTW-compatible libraries under the hood.
For AI: PyTorch's torch.fft.fft uses CUDA-accelerated FFT (cuFFT) on GPU. For sequence lengths that are not powers of 2, cuFFT uses mixed-radix algorithms. When tuning sequence lengths for FFT-based models (FNet, FNO, FlashFFTConv), choosing or avoids prime-length slowdowns.
5. Spectral Leakage and Windowing
5.1 The Leakage Problem
The DFT assumes that the samples represent one complete period of a periodic signal. In reality, we observe a finite segment of a continuous signal - we multiply by a rectangular window that is 1 inside and 0 outside. In frequency, multiplication by a rectangular window corresponds to convolution with the DFT of the rectangle - the Dirichlet kernel:
For non-integer frequencies (frequencies that do not land exactly on a DFT bin), the Dirichlet kernel's large sidelobes spread energy from the true frequency into adjacent bins - this is spectral leakage.
Example. Suppose the signal is with (a non-integer number of cycles in the window). The true spectrum has just two peaks at . The DFT shows energy spread across all bins, with the main peak at the nearest bin ( or ) and significant sidelobe contamination at all other bins.
Consequence for AI: In Whisper's mel spectrogram, a sharp onset (e.g., a consonant burst) at a non-bin frequency would appear smeared across multiple mel filterbanks. Window choice directly affects how well transient events are separated in the spectrogram.
5.2 Window Functions
A window function tapers the signal to zero at its endpoints, reducing the discontinuity that causes leakage. The windowed DFT is:
The spectrum is the DFT of - by the product property (Section 3.7), this equals the circular convolution of and (normalized by ). Windows with smaller sidelobes produce less leakage at the cost of a wider main lobe (reduced frequency resolution).
Standard window functions (all defined for ):
Rectangular (Boxcar):
- Main lobe width: (narrowest)
- Peak sidelobe: (highest leakage)
- Use case: when the signal exactly fits samples; spectral analysis of periodic signals at bin frequencies
Hann (Hanning):
- Main lobe width:
- Peak sidelobe:
- Use case: default choice for general spectral analysis; used in Whisper STFT
Hamming:
- Main lobe width:
- Peak sidelobe:
- Use case: narrowband applications requiring low sidelobes; slightly wider main lobe than Hann
Blackman:
- Main lobe width:
- Peak sidelobe:
- Use case: high-dynamic-range spectral analysis
Kaiser:
where is the zeroth-order modified Bessel function and is a shape parameter.
- : rectangular; : similar to Hamming; : similar to Blackman
- Chebyshev-optimal: for a given main-lobe width, the Kaiser window minimizes peak sidelobe level
- Use case: when precise sidelobe control is required; used in scientific spectral analysis and filter design
5.3 The Main-Lobe/Side-Lobe Tradeoff
Every window function must satisfy a fundamental constraint: the product of main-lobe width (in frequency bins) and side-lobe level (in dB) cannot be simultaneously minimized. This is a consequence of the uncertainty principle applied to finite sequences.
| Window | Main-lobe width (bins) | Peak sidelobe (dB) | Sidelobe rolloff (dB/oct) |
|---|---|---|---|
| Rectangular | 2 | ||
| Hann | 4 | ||
| Hamming | 4 | ||
| Blackman | 6 | ||
| Blackman-Harris | 8 | ||
| Kaiser () | 8 |
Coherent gain: the window amplitude gain, . Windows with lower sidelobes also have lower coherent gain (the main peak is smaller), which must be corrected by dividing by the coherent gain when measuring amplitudes.
Equivalent noise bandwidth (ENBW): the width of a rectangular window that would pass the same noise power. Hann has ENBW bins - it is effectively 1.5 times wider than the rectangular window for noise measurements.
5.4 Scalloping Loss and the Picket Fence Effect
The DFT evaluates the spectrum at exactly uniformly-spaced bins. If a sinusoid's true frequency falls exactly halfway between two bins (at in bin units), then both neighboring bins are equally attenuated. The worst-case attenuation - the scalloping loss - depends on the window:
- Rectangular: (worst case for nearest-bin amplitude estimate)
- Hann:
- Blackman:
This is the picket fence effect: the DFT spectrum looks like a picket fence - it shows signal energy at the posts (bins) but may miss peaks in the gaps between posts.
Mitigation strategies:
-
Zero-padding (Section 6.2): append zeros to increase without changing the observation window, interpolating the spectrum between bins. This does NOT improve frequency resolution (the bandwidth is unchanged) but allows finer frequency estimation.
-
Parabolic interpolation: fit a parabola to the peak bin and its two neighbors; the parabola's maximum is a better estimate of the true frequency.
-
Quinn's estimator and Jacobsen's estimator: closed-form formulas giving sub-bin frequency estimates with low computational cost.
5.5 Overlap-Add and Overlap-Save
For long signals (audio tracks, continuous data streams), we cannot apply a single DFT to the entire signal. We must process it in blocks. Two standard methods handle the boundary effects at block edges:
Overlap-Add (OLA):
- Segment the input into blocks of length
- Window each block with (length )
- Zero-pad each block to length (where is the filter length)
- Compute DFT-based processing on each block
- Overlap and add the outputs (adjacent outputs share samples)
Overlap-Save (OLS):
- Segment the input into blocks of length with overlap of samples
- DFT-based processing on each block (no zero-padding needed)
- Discard the first samples of each output block (the "contaminated" samples)
- Concatenate the remaining samples
Both methods allow exact (non-approximative) processing of long signals with FFT-based operations. The COLA (Constant Overlap-Add) condition for perfect reconstruction is:
where is the hop size. Hann windows with 50% overlap () satisfy COLA, making them the standard for STFT-based audio processing.
6. Frequency Resolution and Zero-Padding
6.1 Frequency Resolution
The frequency resolution of the DFT is:
where is the total observation duration. This fundamental limit says: to distinguish two sinusoids at frequencies and , you need to observe the signal for at least seconds. No amount of zero-padding or upsampling can circumvent this - it is a direct consequence of the uncertainty principle from Section 02.
The time-bandwidth product is fixed: . You cannot simultaneously have:
- Short observation window (good for tracking rapid changes)
- Fine frequency resolution (good for distinguishing close frequencies)
For AI: Whisper uses a 25 ms analysis window (400 samples at 16 kHz) with 10 ms hop size, giving Hz frequency resolution. This is a deliberate engineering choice: fine enough to separate speech formants (typically 100-300 Hz apart) while short enough to track fast phoneme transitions.
6.2 Zero-Padding
Zero-padding: appending zeros to a signal of length before computing an -point DFT ().
What zero-padding does: it computes equally-spaced samples of the continuous DTFT of the original -point signal (i.e., the DTFT sampled at bins instead of ). This is frequency-domain interpolation - it fills in points between the original bins.
What zero-padding does NOT do: improve frequency resolution. The DTFT itself has resolution limited by (the original number of samples). Zero-padding only reveals the DTFT more finely; it cannot resolve frequencies closer than apart.
Applications of zero-padding:
-
Sub-bin frequency estimation (combine with interpolation): zero-padding by factor of 4-8 allows visual identification of spectral peaks with sub-bin accuracy.
-
Linear convolution via FFT: to convolve signals of lengths and linearly (not circularly), zero-pad both to length before DFT multiplication (Section 3.6). The circular convolution of the zero-padded sequences equals the linear convolution of the originals.
-
Next power-of-2: zero-pad to the nearest power of 2 to maximize FFT efficiency, e.g., 1000 -> 1024.
For AI: The Fourier Neural Operator (Section 8.2) explicitly truncates the spectrum to the lowest frequencies before learning, implicitly treating the signal as if zero-padded beyond the training domain. FlashFFTConv similarly uses zero-padding to convert long circular convolutions to linear ones.
6.3 The Nyquist-Shannon Sampling Theorem
Theorem (Shannon, 1949; Nyquist, 1928; Whittaker, 1915). If a continuous signal is bandlimited - meaning for - then can be perfectly reconstructed from samples at rate .
Why the DFT cares: if , frequencies above alias onto lower frequencies. Specifically, a sinusoid at frequency produces the same sample sequence as a sinusoid at frequency . In the DFT, bin and bin represent the same physical frequency if is not high enough.
The aliasing formula: a signal at frequency aliases to , i.e., the nearest copy within .
Connection to Poisson summation (from Section 02-7.5): the spectrum of the sampled signal is the sum of periodically-shifted copies of the continuous spectrum:
If adjacent copies overlap (i.e., ), they contaminate each other - this is aliasing. If , the copies are non-overlapping and the original spectrum can be recovered by applying a lowpass filter with cutoff at .
Practical rule: In practice, use with an anti-aliasing lowpass filter at before the ADC. Audio CDs use kHz because human hearing extends to kHz (requires kHz), with the extra 2.1 kHz providing a transition band for the anti-aliasing filter.
6.4 Practical FFT Setup Checklist
When computing an FFT-based spectral analysis, follow this sequence:
PRACTICAL FFT CHECKLIST
========================================================================
1. CHOOSE N
- Prefer N = 2^k (fastest FFT)
- If not possible: N = 2^a * 3^b * 5^c (still fast)
- N controls frequency resolution: f = fs / N
2. APPLY WINDOW
- Default: Hann window (good sidelobes, modest resolution loss)
- High dynamic range: Blackman or Kaiser ( approx 8)
- If signal is periodic with exactly N samples: rectangular
3. COMPUTE FFT
- Real input: use rfft (returns N/2+1 values)
- Complex input: use fft (returns N values)
4. BUILD FREQUENCY AXIS
- For rfft: freqs = np.fft.rfftfreq(N, d=1/fs) [0, fs/2]
- For fft: freqs = np.fft.fftfreq(N, d=1/fs) [0, fs/2, -fs/2, 0]
- After fftshift: freqs = [-fs/2, ..., 0, ..., fs/2]
5. INTERPRET MAGNITUDE
- |X[k]| / N -> amplitude of the sinusoid
- |X[k]|^2 / N -> power spectral density
- Divide by window coherent gain for absolute amplitudes
6. HANDLE NEGATIVE FREQUENCIES
- Use fftshift(X) to center zero-frequency
- For real inputs: mirror the positive half (rfft returns one-sided)
========================================================================