Lesson overview | Previous part | Lesson overview
Fourier Transform: Appendix G: Further Reading and References to Appendix H: Self-Assessment Questions
Appendix G: Further Reading and References
G.1 Foundational Texts
For the classical theory:
- Stein, E.M. & Weiss, G. (1971). Introduction to Fourier Analysis on Euclidean Spaces. Princeton University Press. - The standard reference for rigorous Fourier analysis.
- Katznelson, Y. (2004). An Introduction to Harmonic Analysis, 3rd ed. Cambridge. - Concise and rigorous; excellent for the abstract perspective.
- Dym, H. & McKean, H.P. (1972). Fourier Series and Integrals. Academic Press. - Beautiful, accessible treatment with applications to probability.
- Korner, T.W. (1988). Fourier Analysis. Cambridge. - The most readable and historically motivated account.
For distributions and Schwartz space:
- Schwartz, L. (1950). Theorie des Distributions. Herman, Paris. - The original; defines tempered distributions.
- Hormander, L. (1990). The Analysis of Linear Partial Differential Operators I. Springer. - Comprehensive; connects distributions to PDE theory.
For signal processing:
- Oppenheim, A.V. & Schafer, R.W. (2009). Discrete-Time Signal Processing, 3rd ed. Prentice Hall. - Standard DSP textbook.
- Mallat, S. (2008). A Wavelet Tour of Signal Processing, 3rd ed. Academic Press. - Bridges Fourier analysis, wavelets, and sparse signal processing.
G.2 Machine Learning Papers
FNet:
- Lee-Thorp, J., Ainslie, J., Eckstein, I. & Ontanon, S. (2022). FNet: Mixing Tokens with Fourier Transforms. NAACL-HLT. arXiv:2105.03824.
Random Fourier Features:
- Rahimi, A. & Recht, B. (2007). Random Features for Large-Scale Kernel Machines. NeurIPS. [Best Paper Award]
- Yu, F.X. et al. (2016). Orthogonal Random Features. NeurIPS.
- Choromanski, K. et al. (2021). Rethinking Attention with Performers. ICLR. arXiv:2009.14794.
Spectral Normalization:
- Miyato, T., Kataoka, T., Koyama, M. & Yoshida, Y. (2018). Spectral Normalization for Generative Adversarial Networks. ICLR. arXiv:1802.05957.
Fourier Neural Operator:
- Li, Z. et al. (2021). Fourier Neural Operator for Parametric Partial Differential Equations. ICLR. arXiv:2010.08895.
RoPE and Context Extension:
- Su, J. et al. (2021). RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv:2104.09864.
- Peng, B. et al. (2023). YaRN: Efficient Context Window Extension of Large Language Models. arXiv:2309.00071.
- Ding, Y. et al. (2024). LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens. arXiv:2402.13753.
Spectral Bias:
- Rahaman, N. et al. (2019). On the Spectral Bias of Neural Networks. ICML. arXiv:1806.08734.
- Tancik, M. et al. (2020). Fourier Features Let Networks Learn High Frequency Functions. NeurIPS. arXiv:2006.10739.
Audio Models:
- Radford, A. et al. (2022). Robust Speech Recognition via Large-Scale Weak Supervision (Whisper). arXiv:2212.04356.
Alias-free GAN:
- Karras, T. et al. (2021). Alias-Free Generative Adversarial Networks. NeurIPS. arXiv:2106.12423.
G.3 Online Resources
- MIT OCW 18.103 (Fourier Analysis) - rigorous mathematical treatment with lecture notes
- Stanford EE261 (The Fourier Transform and its Applications) - engineering-oriented, excellent visualization
- 3Blue1Brown "But what is the Fourier Transform?" - geometric intuition for absolute beginners
- Seeing Theory (Brown University) - interactive visualizations of Fourier series and transforms
Appendix H: Self-Assessment Questions
These questions test conceptual understanding without computation. Answer each in 2-3 sentences.
-
Explain in words why the Fourier Transform of a spike (Dirac delta) is a constant (flat spectrum). What physical interpretation does this have?
-
A signal engineer claims: "If I double the duration of my pulse, I can halve its bandwidth." Is this claim correct, and what theorem guarantees it? What is the fundamental limit on simultaneous time and frequency concentration?
-
What is the difference between the Parseval identity from Fourier series (Section 01) and Plancherel's theorem from this section? Are they the same result in different forms?
-
Why does the Fourier Transform of a real-valued function satisfy ? What does this mean for the magnitude and phase spectra?
-
Explain why FNet achieves near-BERT performance on classification tasks but not on extractive QA. What mathematical property of the DFT explains this performance gap?
-
In RoPE, the frequencies are chosen geometrically. Why geometric spacing? What would happen if the frequencies were linearly spaced?
-
Bochner's theorem says a shift-invariant kernel is PD iff it is the FT of a non-negative measure. What breaks down if the spectral density is not non-negative? Give an example.
-
Explain the connection between the Poisson summation formula and Shannon's sampling theorem. Why does undersampling (below the Nyquist rate) cause aliasing?
-
The heat equation solution is a convolution with a Gaussian that spreads over time. In the Fourier domain, how does the solution evolve? Which frequencies decay fastest and why?
-
The differentiation property states . Use this to explain why smoother functions have faster spectral decay. What happens for a function with a jump discontinuity?
-
In spectral normalization, the discriminator is normalized by . Why does this enforce a Lipschitz constraint? How does the Lipschitz constant relate to training stability?
-
The Gaussian is the unique minimizer of the uncertainty bound. Does this mean we should always use Gaussian signals in practice? What are the tradeoffs?
End of Section 20-02 Fourier Transform.
Next: Section 20-03 DFT and FFT - discretizing the Fourier Transform for computation via the Cooley-Tukey algorithm.