Private notes
0/8000

Notes stay private to your browser until account sync is configured.

Part 4
5 min read8 headingsSplit lesson page

Lesson overview | Previous part | Lesson overview

Linear Transformations: Appendix N: Notation Summary for This Section to Appendix P: Quick Reference - Common Linear Maps in R2\mathbb{R}^2 and R3\mathbb{R}^3

Appendix N: Notation Summary for This Section

SymbolMeaning
T:VWT: V \to WLinear transformation from VV to WW
ker(T)\ker(T)Kernel (null space) of TT
im(T)\operatorname{im}(T)Image (range) of TT
rank(T)\operatorname{rank}(T)Dimension of im(T)\operatorname{im}(T)
nullity(T)\operatorname{nullity}(T)Dimension of ker(T)\ker(T)
[T]BC[T]_{\mathcal{B}}^{\mathcal{C}}Matrix of TT from basis B\mathcal{B} to basis C\mathcal{C}
PPChange-of-basis matrix
TT^\topDual (transpose) map
VV^*Dual space of VV
DfxDf_{\mathbf{x}}Total derivative (Frechet derivative) of ff at x\mathbf{x}
Jf(x)J_f(\mathbf{x})Jacobian matrix of ff at x\mathbf{x}
L(V,W)\mathcal{L}(V, W)Space of all linear maps from VV to WW
VWV \cong WVV and WW are isomorphic
V/KV/KQuotient space of VV modulo subspace KK
ABA \sim BAA and BB are similar matrices (B=P1APB = P^{-1}AP)
P2=PP^2 = PProjection (idempotent)
AA=IA^\top A = IOrthogonal matrix
f(x)=Ax+bf(\mathbf{x}) = A\mathbf{x} + \mathbf{b}Affine map
ΔW=BA\Delta W = BALoRA low-rank update, rankr\operatorname{rank} \leq r

Appendix O: Linear Maps and Symmetry

O.1 Equivariant Maps

A linear map T:VWT: V \to W is equivariant with respect to a group GG if, for every gGg \in G and every vV\mathbf{v} \in V:

T(ρV(g)v)=ρW(g)T(v)T(\rho_V(g)\mathbf{v}) = \rho_W(g) T(\mathbf{v})

where ρV:GGL(V)\rho_V: G \to GL(V) and ρW:GGL(W)\rho_W: G \to GL(W) are representations of GG on VV and WW.

Intuitively: "applying the group action then the map = applying the map then the group action." The map commutes with the symmetry.

Examples:

  • Translation equivariance: T(v+c)=T(v)+T(c)T(v + c) = T(v) + T(c)... but this is additivity - every linear map is equivariant to the translation group on vector spaces.
  • Rotation equivariance: T(Rv)=RT(v)T(R\mathbf{v}) = RT(\mathbf{v}) for all rotations RR. In 3D: implies T=λIT = \lambda I for some scalar λ\lambda (Schur's lemma for the rotation representation).
  • Permutation equivariance: T(Pv)=PT(v)T(P\mathbf{v}) = PT(\mathbf{v}) for all permutation matrices PP. Implies TT is a sum of a "same-position" term and a "mean-field" term - this is why mean pooling and attention with tied weights are permutation equivariant.

For AI: CNNs achieve translation equivariance by using convolutional (shared-weight) linear maps. Equivariant graph neural networks use permutation-equivariant maps. Geometric deep learning is the systematic study of building neural networks as compositions of equivariant linear maps. Transformer attention (without positional encoding) is permutation equivariant - adding positional encodings explicitly breaks this symmetry.

O.2 Schur's Lemma and Irreducible Representations

Schur's Lemma. Let T:VVT: V \to V be a linear map that commutes with all maps in an irreducible representation of a group GG (i.e., Tρ(g)=ρ(g)TT\rho(g) = \rho(g)T for all gGg \in G). Then T=λIT = \lambda I for some scalar λ\lambda.

This powerful result says: the only linear maps that commute with all symmetries of an irreducible representation are scalar multiples of the identity. This constrains the form of equivariant maps.

Application: If attention weights must be equivariant to the representation of a certain symmetry group acting on the heads, Schur's lemma constrains the possible attention patterns.

O.3 Representation Theory Preview

Representation theory studies how groups act on vector spaces via linear maps. Every group representation ρ:GGL(V)\rho: G \to GL(V) is a group homomorphism - a map that takes group elements to invertible linear maps, preserving the group structure:

ρ(gh)=ρ(g)ρ(h)(composition respects group multiplication)\rho(gh) = \rho(g)\rho(h) \quad \text{(composition respects group multiplication)}

This is the language in which equivariant neural networks (E(3)-equivariant networks for molecular property prediction, SE(3)-equivariant networks for robotics, permutation-equivariant networks for sets) are designed. The "weights" of an equivariant linear layer are constrained to be equivariant - and representation theory tells you exactly what form these weights can take.


Appendix P: Quick Reference - Common Linear Maps in R2\mathbb{R}^2 and R3\mathbb{R}^3

Common 2×22 \times 2 Linear Maps

TransformationMatrixProperties
Rotation by θ\theta(cosθsinθsinθcosθ)\begin{pmatrix}\cos\theta & -\sin\theta \\ \sin\theta & \cos\theta\end{pmatrix}Orthogonal, det=1\det=1, $
Reflection across xx-axis(1001)\begin{pmatrix}1 & 0 \\ 0 & -1\end{pmatrix}Symmetric, det=1\det=-1, λ=±1\lambda = \pm 1
Reflection across y=xy=x(0110)\begin{pmatrix}0 & 1 \\ 1 & 0\end{pmatrix}Symmetric, det=1\det=-1, λ=±1\lambda = \pm 1
Horizontal shear by kk(1k01)\begin{pmatrix}1 & k \\ 0 & 1\end{pmatrix}det=1\det=1, λ=1\lambda = 1 (double)
Scaling by (a,b)(a, b)(a00b)\begin{pmatrix}a & 0 \\ 0 & b\end{pmatrix}Symmetric, det=ab\det=ab, λ=a,b\lambda = a, b
Projection onto xx-axis(1000)\begin{pmatrix}1 & 0 \\ 0 & 0\end{pmatrix}Symmetric, idempotent, det=0\det=0, λ=0,1\lambda=0,1
Projection onto y=xy=x12(1111)\frac{1}{2}\begin{pmatrix}1 & 1 \\ 1 & 1\end{pmatrix}Symmetric, idempotent, λ=0,1\lambda=0,1
Zero map(0000)\begin{pmatrix}0 & 0 \\ 0 & 0\end{pmatrix}Rank 0, det=0\det=0, λ=0\lambda=0
Identity(1001)\begin{pmatrix}1 & 0 \\ 0 & 1\end{pmatrix}All eigenvalues 1, det=1\det=1

All these are linear maps. To make them affine (include translation), append a row and column in homogeneous form.

Common 3×33 \times 3 Linear Maps

TransformationDescriptionKey Properties
Rotation around zz-axisRz(θ)R_z(\theta): rotates xyxy-plane, fixes zzOrthogonal, det=1\det=1
Reflection across xyxy-planediag(1,1,1)\operatorname{diag}(1,1,-1)Symmetric, det=1\det=-1
Projection onto xyxy-planediag(1,1,0)\operatorname{diag}(1,1,0)Symmetric, idempotent, rank 2
Householder reflectionI2nnI - 2\mathbf{n}\mathbf{n}^\topSymmetric, det=1\det=-1, λ=1\lambda = 1 (mult. 2) and 1-1
Scalingdiag(a,b,c)\operatorname{diag}(a,b,c)Diagonal; eigenvalues are a,b,ca,b,c
ShearI+seiejI + s\,\mathbf{e}_i\mathbf{e}_j^\top (iji \neq j)det=1\det=1; all eigenvalues 1

End of Linear Transformations section. Continue to 05: Orthogonality and Orthonormality.

Skill Check

Test this lesson

Answer 4 quick questions to lock in the lesson and feed your adaptive practice queue.

--
Score
0/4
Answered
Not attempted
Status
1

Which module does this lesson belong to?

2

Which section is covered in this lesson content?

3

Which term is most central to this lesson?

4

What is the best way to use this lesson for real learning?

Your answers save locally first, then sync when account storage is available.
Practice queue