Lesson overview | Previous part | Next part
Optimization on Manifolds: Part 3: Core Theory
3. Core Theory
Core Theory develops the part of optimization on manifolds specified by the approved Chapter 25 table of contents. The treatment is geometry-first and AI-facing.
3.1 Sphere optimization
Sphere optimization belongs to the canonical scope of Optimization on Manifolds. The goal is to make curved-space reasoning concrete enough for ML practice without turning the section into a pure topology course.
Working scope for this subsection: Riemannian gradient descent, retractions, vector transport, Riemannian Hessian preview, first-order optimality, matrix manifolds, and ML optimization examples. The recurring pattern is localize, linearize, measure, move, and return to the manifold.
Operational definition.
Manifold optimization updates in a tangent space and maps the step back to the manifold with an exponential map or retraction.
Worked reading.
On the sphere, take a tangent gradient step and normalize. Normalization is a simple retraction because it returns to the sphere and agrees with the tangent direction to first order.
| Geometric object | Meaning | AI interpretation |
|---|---|---|
| Manifold | Curved space with local coordinates | Data manifold, latent space, constraint set, parameter space |
| Chart | Local coordinate map | Local representation or embedding coordinates |
| Tangent space | Linearized directions at | Local perturbations, gradients, velocities |
| Metric | Inner product on | Geometry-aware length, angle, steepest descent |
| Geodesic | Straightest curved-space path | Latent interpolation, shortest motion, curved optimization path |
| Retraction | Practical map from tangent step back to | Efficient constrained update in training loops |
Three examples of sphere optimization:
- PCA on Grassmann manifolds.
- Orthogonal weights on Stiefel manifolds.
- Covariance learning on SPD manifolds.
Two non-examples clarify the boundary:
- Euclidean gradient descent followed by arbitrary clipping.
- A projection step that destroys the first-order update direction.
Proof or verification habit for sphere optimization:
Check tangent feasibility, descent direction under the metric, and retraction properties.
global object -> curved manifold or constraint set
local object -> chart, tangent space, or coordinate patch
linear operation -> derivative, gradient, velocity, Hessian approximation
geometric measure -> metric, length, distance, curvature
algorithmic move -> tangent step followed by geodesic or retraction
In AI systems, sphere optimization matters because learned representations and constrained parameter spaces are rarely globally flat. A local linear approximation may be useful, but it must be attached to the point where it is valid.
This turns constraints such as orthogonality, low rank, and positive definiteness into native geometry instead of penalties.
Mini derivation lens.
- Choose a point on the manifold and name the local representation used near .
- Move the question into a chart, tangent space, or embedded constraint where first-order calculus is available.
- Compute the local object: derivative, tangent projection, metric-weighted gradient, path velocity, or retraction step.
- Translate the result back into coordinate-free language so the answer is not tied to one chart by accident.
- Check the invariant: the point remains on , the direction remains in , or the distance/gradient uses the stated metric.
Implementation lens.
A practical ML implementation should store both the ambient array representation and the geometric contract attached to it. For example, a normalized embedding is not just a vector; it is a point on a sphere. An orthogonal weight matrix is not just a matrix; it is a point on a Stiefel-type constraint. A covariance matrix is not just a symmetric array; it must stay positive definite.
The clean computational pattern is: encode the state, compute an ambient derivative if needed, convert it into a tangent or metric-aware object, take a small local step, and then return to the manifold with a geodesic formula or retraction. This is the same pattern used in the companion notebooks, just scaled down to visible two- and three-dimensional examples.
The important warning is that coordinate code can pass shape checks while still violating geometry. Differential geometry adds checks that are semantic: tangentness, smooth compatibility, metric choice, path validity, and constraint preservation.
Practical checklist:
- State the manifold and whether it is abstract, embedded, or quotient-like.
- State the local coordinates or tangent representation being used.
- Separate ambient vectors from tangent vectors.
- Name the metric before computing distances, angles, or gradients.
- Use geodesics or retractions when moving on the manifold.
- For ML claims, identify whether geometry is data geometry, parameter geometry, or statistical geometry.
Local diagnostic: Write gradient, tangent projection, retraction, and stopping criterion.
The companion notebook uses low-dimensional synthetic examples: circles, spheres, tangent projections, spherical interpolation, SPD matrices, and orthogonality constraints. These examples keep geometry visible while preserving the same update logic used in higher-dimensional ML systems.
| Compact ML phrase | Differential-geometric reading |
|---|---|
| local linearization | tangent-space approximation at a point |
| normalized embedding | point on a sphere with tangent constraints |
| natural gradient | Riemannian gradient under Fisher metric |
| orthogonal weights | point on a Stiefel-type manifold |
| latent interpolation | path that may need geodesic structure |
| covariance geometry | SPD manifold rather than arbitrary matrices |
A useful learning move is to compute everything first on a sphere. The sphere has visible curvature, simple tangent spaces, closed-form geodesics, and practical retractions. Once those are clear, Stiefel, Grassmann, SPD, and information-geometric examples become less mysterious.
For implementation, the main discipline is to avoid leaving the manifold silently. If a gradient step violates a constraint, either project the gradient into the tangent space before stepping or use a method whose update is intrinsic by design.
The final question for this subsection is whether a Euclidean formula is being used as an approximation, a coordinate expression, or a mistaken replacement for geometry. Differential geometry is the habit of telling those cases apart.
3.2 Stiefel manifold and orthogonality constraints
Stiefel manifold and orthogonality constraints belongs to the canonical scope of Optimization on Manifolds. The goal is to make curved-space reasoning concrete enough for ML practice without turning the section into a pure topology course.
Working scope for this subsection: Riemannian gradient descent, retractions, vector transport, Riemannian Hessian preview, first-order optimality, matrix manifolds, and ML optimization examples. The recurring pattern is localize, linearize, measure, move, and return to the manifold.
Operational definition.
Manifold optimization updates in a tangent space and maps the step back to the manifold with an exponential map or retraction.
Worked reading.
On the sphere, take a tangent gradient step and normalize. Normalization is a simple retraction because it returns to the sphere and agrees with the tangent direction to first order.
| Geometric object | Meaning | AI interpretation |
|---|---|---|
| Manifold | Curved space with local coordinates | Data manifold, latent space, constraint set, parameter space |
| Chart | Local coordinate map | Local representation or embedding coordinates |
| Tangent space | Linearized directions at | Local perturbations, gradients, velocities |
| Metric | Inner product on | Geometry-aware length, angle, steepest descent |
| Geodesic | Straightest curved-space path | Latent interpolation, shortest motion, curved optimization path |
| Retraction | Practical map from tangent step back to | Efficient constrained update in training loops |
Three examples of stiefel manifold and orthogonality constraints:
- PCA on Grassmann manifolds.
- Orthogonal weights on Stiefel manifolds.
- Covariance learning on SPD manifolds.
Two non-examples clarify the boundary:
- Euclidean gradient descent followed by arbitrary clipping.
- A projection step that destroys the first-order update direction.
Proof or verification habit for stiefel manifold and orthogonality constraints:
Check tangent feasibility, descent direction under the metric, and retraction properties.
global object -> curved manifold or constraint set
local object -> chart, tangent space, or coordinate patch
linear operation -> derivative, gradient, velocity, Hessian approximation
geometric measure -> metric, length, distance, curvature
algorithmic move -> tangent step followed by geodesic or retraction
In AI systems, stiefel manifold and orthogonality constraints matters because learned representations and constrained parameter spaces are rarely globally flat. A local linear approximation may be useful, but it must be attached to the point where it is valid.
This turns constraints such as orthogonality, low rank, and positive definiteness into native geometry instead of penalties.
Mini derivation lens.
- Choose a point on the manifold and name the local representation used near .
- Move the question into a chart, tangent space, or embedded constraint where first-order calculus is available.
- Compute the local object: derivative, tangent projection, metric-weighted gradient, path velocity, or retraction step.
- Translate the result back into coordinate-free language so the answer is not tied to one chart by accident.
- Check the invariant: the point remains on , the direction remains in , or the distance/gradient uses the stated metric.
Implementation lens.
A practical ML implementation should store both the ambient array representation and the geometric contract attached to it. For example, a normalized embedding is not just a vector; it is a point on a sphere. An orthogonal weight matrix is not just a matrix; it is a point on a Stiefel-type constraint. A covariance matrix is not just a symmetric array; it must stay positive definite.
The clean computational pattern is: encode the state, compute an ambient derivative if needed, convert it into a tangent or metric-aware object, take a small local step, and then return to the manifold with a geodesic formula or retraction. This is the same pattern used in the companion notebooks, just scaled down to visible two- and three-dimensional examples.
The important warning is that coordinate code can pass shape checks while still violating geometry. Differential geometry adds checks that are semantic: tangentness, smooth compatibility, metric choice, path validity, and constraint preservation.
Practical checklist:
- State the manifold and whether it is abstract, embedded, or quotient-like.
- State the local coordinates or tangent representation being used.
- Separate ambient vectors from tangent vectors.
- Name the metric before computing distances, angles, or gradients.
- Use geodesics or retractions when moving on the manifold.
- For ML claims, identify whether geometry is data geometry, parameter geometry, or statistical geometry.
Local diagnostic: Write gradient, tangent projection, retraction, and stopping criterion.
The companion notebook uses low-dimensional synthetic examples: circles, spheres, tangent projections, spherical interpolation, SPD matrices, and orthogonality constraints. These examples keep geometry visible while preserving the same update logic used in higher-dimensional ML systems.
| Compact ML phrase | Differential-geometric reading |
|---|---|
| local linearization | tangent-space approximation at a point |
| normalized embedding | point on a sphere with tangent constraints |
| natural gradient | Riemannian gradient under Fisher metric |
| orthogonal weights | point on a Stiefel-type manifold |
| latent interpolation | path that may need geodesic structure |
| covariance geometry | SPD manifold rather than arbitrary matrices |
A useful learning move is to compute everything first on a sphere. The sphere has visible curvature, simple tangent spaces, closed-form geodesics, and practical retractions. Once those are clear, Stiefel, Grassmann, SPD, and information-geometric examples become less mysterious.
For implementation, the main discipline is to avoid leaving the manifold silently. If a gradient step violates a constraint, either project the gradient into the tangent space before stepping or use a method whose update is intrinsic by design.
The final question for this subsection is whether a Euclidean formula is being used as an approximation, a coordinate expression, or a mistaken replacement for geometry. Differential geometry is the habit of telling those cases apart.
3.3 Grassmann manifold and subspace learning
Grassmann manifold and subspace learning belongs to the canonical scope of Optimization on Manifolds. The goal is to make curved-space reasoning concrete enough for ML practice without turning the section into a pure topology course.
Working scope for this subsection: Riemannian gradient descent, retractions, vector transport, Riemannian Hessian preview, first-order optimality, matrix manifolds, and ML optimization examples. The recurring pattern is localize, linearize, measure, move, and return to the manifold.
Operational definition.
Manifold optimization updates in a tangent space and maps the step back to the manifold with an exponential map or retraction.
Worked reading.
On the sphere, take a tangent gradient step and normalize. Normalization is a simple retraction because it returns to the sphere and agrees with the tangent direction to first order.
| Geometric object | Meaning | AI interpretation |
|---|---|---|
| Manifold | Curved space with local coordinates | Data manifold, latent space, constraint set, parameter space |
| Chart | Local coordinate map | Local representation or embedding coordinates |
| Tangent space | Linearized directions at | Local perturbations, gradients, velocities |
| Metric | Inner product on | Geometry-aware length, angle, steepest descent |
| Geodesic | Straightest curved-space path | Latent interpolation, shortest motion, curved optimization path |
| Retraction | Practical map from tangent step back to | Efficient constrained update in training loops |
Three examples of grassmann manifold and subspace learning:
- PCA on Grassmann manifolds.
- Orthogonal weights on Stiefel manifolds.
- Covariance learning on SPD manifolds.
Two non-examples clarify the boundary:
- Euclidean gradient descent followed by arbitrary clipping.
- A projection step that destroys the first-order update direction.
Proof or verification habit for grassmann manifold and subspace learning:
Check tangent feasibility, descent direction under the metric, and retraction properties.
global object -> curved manifold or constraint set
local object -> chart, tangent space, or coordinate patch
linear operation -> derivative, gradient, velocity, Hessian approximation
geometric measure -> metric, length, distance, curvature
algorithmic move -> tangent step followed by geodesic or retraction
In AI systems, grassmann manifold and subspace learning matters because learned representations and constrained parameter spaces are rarely globally flat. A local linear approximation may be useful, but it must be attached to the point where it is valid.
This turns constraints such as orthogonality, low rank, and positive definiteness into native geometry instead of penalties.
Mini derivation lens.
- Choose a point on the manifold and name the local representation used near .
- Move the question into a chart, tangent space, or embedded constraint where first-order calculus is available.
- Compute the local object: derivative, tangent projection, metric-weighted gradient, path velocity, or retraction step.
- Translate the result back into coordinate-free language so the answer is not tied to one chart by accident.
- Check the invariant: the point remains on , the direction remains in , or the distance/gradient uses the stated metric.
Implementation lens.
A practical ML implementation should store both the ambient array representation and the geometric contract attached to it. For example, a normalized embedding is not just a vector; it is a point on a sphere. An orthogonal weight matrix is not just a matrix; it is a point on a Stiefel-type constraint. A covariance matrix is not just a symmetric array; it must stay positive definite.
The clean computational pattern is: encode the state, compute an ambient derivative if needed, convert it into a tangent or metric-aware object, take a small local step, and then return to the manifold with a geodesic formula or retraction. This is the same pattern used in the companion notebooks, just scaled down to visible two- and three-dimensional examples.
The important warning is that coordinate code can pass shape checks while still violating geometry. Differential geometry adds checks that are semantic: tangentness, smooth compatibility, metric choice, path validity, and constraint preservation.
Practical checklist:
- State the manifold and whether it is abstract, embedded, or quotient-like.
- State the local coordinates or tangent representation being used.
- Separate ambient vectors from tangent vectors.
- Name the metric before computing distances, angles, or gradients.
- Use geodesics or retractions when moving on the manifold.
- For ML claims, identify whether geometry is data geometry, parameter geometry, or statistical geometry.
Local diagnostic: Write gradient, tangent projection, retraction, and stopping criterion.
The companion notebook uses low-dimensional synthetic examples: circles, spheres, tangent projections, spherical interpolation, SPD matrices, and orthogonality constraints. These examples keep geometry visible while preserving the same update logic used in higher-dimensional ML systems.
| Compact ML phrase | Differential-geometric reading |
|---|---|
| local linearization | tangent-space approximation at a point |
| normalized embedding | point on a sphere with tangent constraints |
| natural gradient | Riemannian gradient under Fisher metric |
| orthogonal weights | point on a Stiefel-type manifold |
| latent interpolation | path that may need geodesic structure |
| covariance geometry | SPD manifold rather than arbitrary matrices |
A useful learning move is to compute everything first on a sphere. The sphere has visible curvature, simple tangent spaces, closed-form geodesics, and practical retractions. Once those are clear, Stiefel, Grassmann, SPD, and information-geometric examples become less mysterious.
For implementation, the main discipline is to avoid leaving the manifold silently. If a gradient step violates a constraint, either project the gradient into the tangent space before stepping or use a method whose update is intrinsic by design.
The final question for this subsection is whether a Euclidean formula is being used as an approximation, a coordinate expression, or a mistaken replacement for geometry. Differential geometry is the habit of telling those cases apart.
3.4 SPD manifold and covariance learning
SPD manifold and covariance learning belongs to the canonical scope of Optimization on Manifolds. The goal is to make curved-space reasoning concrete enough for ML practice without turning the section into a pure topology course.
Working scope for this subsection: Riemannian gradient descent, retractions, vector transport, Riemannian Hessian preview, first-order optimality, matrix manifolds, and ML optimization examples. The recurring pattern is localize, linearize, measure, move, and return to the manifold.
Operational definition.
Manifold optimization updates in a tangent space and maps the step back to the manifold with an exponential map or retraction.
Worked reading.
On the sphere, take a tangent gradient step and normalize. Normalization is a simple retraction because it returns to the sphere and agrees with the tangent direction to first order.
| Geometric object | Meaning | AI interpretation |
|---|---|---|
| Manifold | Curved space with local coordinates | Data manifold, latent space, constraint set, parameter space |
| Chart | Local coordinate map | Local representation or embedding coordinates |
| Tangent space | Linearized directions at | Local perturbations, gradients, velocities |
| Metric | Inner product on | Geometry-aware length, angle, steepest descent |
| Geodesic | Straightest curved-space path | Latent interpolation, shortest motion, curved optimization path |
| Retraction | Practical map from tangent step back to | Efficient constrained update in training loops |
Three examples of spd manifold and covariance learning:
- PCA on Grassmann manifolds.
- Orthogonal weights on Stiefel manifolds.
- Covariance learning on SPD manifolds.
Two non-examples clarify the boundary:
- Euclidean gradient descent followed by arbitrary clipping.
- A projection step that destroys the first-order update direction.
Proof or verification habit for spd manifold and covariance learning:
Check tangent feasibility, descent direction under the metric, and retraction properties.
global object -> curved manifold or constraint set
local object -> chart, tangent space, or coordinate patch
linear operation -> derivative, gradient, velocity, Hessian approximation
geometric measure -> metric, length, distance, curvature
algorithmic move -> tangent step followed by geodesic or retraction
In AI systems, spd manifold and covariance learning matters because learned representations and constrained parameter spaces are rarely globally flat. A local linear approximation may be useful, but it must be attached to the point where it is valid.
This turns constraints such as orthogonality, low rank, and positive definiteness into native geometry instead of penalties.
Mini derivation lens.
- Choose a point on the manifold and name the local representation used near .
- Move the question into a chart, tangent space, or embedded constraint where first-order calculus is available.
- Compute the local object: derivative, tangent projection, metric-weighted gradient, path velocity, or retraction step.
- Translate the result back into coordinate-free language so the answer is not tied to one chart by accident.
- Check the invariant: the point remains on , the direction remains in , or the distance/gradient uses the stated metric.
Implementation lens.
A practical ML implementation should store both the ambient array representation and the geometric contract attached to it. For example, a normalized embedding is not just a vector; it is a point on a sphere. An orthogonal weight matrix is not just a matrix; it is a point on a Stiefel-type constraint. A covariance matrix is not just a symmetric array; it must stay positive definite.
The clean computational pattern is: encode the state, compute an ambient derivative if needed, convert it into a tangent or metric-aware object, take a small local step, and then return to the manifold with a geodesic formula or retraction. This is the same pattern used in the companion notebooks, just scaled down to visible two- and three-dimensional examples.
The important warning is that coordinate code can pass shape checks while still violating geometry. Differential geometry adds checks that are semantic: tangentness, smooth compatibility, metric choice, path validity, and constraint preservation.
Practical checklist:
- State the manifold and whether it is abstract, embedded, or quotient-like.
- State the local coordinates or tangent representation being used.
- Separate ambient vectors from tangent vectors.
- Name the metric before computing distances, angles, or gradients.
- Use geodesics or retractions when moving on the manifold.
- For ML claims, identify whether geometry is data geometry, parameter geometry, or statistical geometry.
Local diagnostic: Write gradient, tangent projection, retraction, and stopping criterion.
The companion notebook uses low-dimensional synthetic examples: circles, spheres, tangent projections, spherical interpolation, SPD matrices, and orthogonality constraints. These examples keep geometry visible while preserving the same update logic used in higher-dimensional ML systems.
| Compact ML phrase | Differential-geometric reading |
|---|---|
| local linearization | tangent-space approximation at a point |
| normalized embedding | point on a sphere with tangent constraints |
| natural gradient | Riemannian gradient under Fisher metric |
| orthogonal weights | point on a Stiefel-type manifold |
| latent interpolation | path that may need geodesic structure |
| covariance geometry | SPD manifold rather than arbitrary matrices |
A useful learning move is to compute everything first on a sphere. The sphere has visible curvature, simple tangent spaces, closed-form geodesics, and practical retractions. Once those are clear, Stiefel, Grassmann, SPD, and information-geometric examples become less mysterious.
For implementation, the main discipline is to avoid leaving the manifold silently. If a gradient step violates a constraint, either project the gradient into the tangent space before stepping or use a method whose update is intrinsic by design.
The final question for this subsection is whether a Euclidean formula is being used as an approximation, a coordinate expression, or a mistaken replacement for geometry. Differential geometry is the habit of telling those cases apart.
3.5 Riemannian line search and trust-region preview
Riemannian line search and trust-region preview belongs to the canonical scope of Optimization on Manifolds. The goal is to make curved-space reasoning concrete enough for ML practice without turning the section into a pure topology course.
Working scope for this subsection: Riemannian gradient descent, retractions, vector transport, Riemannian Hessian preview, first-order optimality, matrix manifolds, and ML optimization examples. The recurring pattern is localize, linearize, measure, move, and return to the manifold.
Operational definition.
Manifold optimization updates in a tangent space and maps the step back to the manifold with an exponential map or retraction.
Worked reading.
On the sphere, take a tangent gradient step and normalize. Normalization is a simple retraction because it returns to the sphere and agrees with the tangent direction to first order.
| Geometric object | Meaning | AI interpretation |
|---|---|---|
| Manifold | Curved space with local coordinates | Data manifold, latent space, constraint set, parameter space |
| Chart | Local coordinate map | Local representation or embedding coordinates |
| Tangent space | Linearized directions at | Local perturbations, gradients, velocities |
| Metric | Inner product on | Geometry-aware length, angle, steepest descent |
| Geodesic | Straightest curved-space path | Latent interpolation, shortest motion, curved optimization path |
| Retraction | Practical map from tangent step back to | Efficient constrained update in training loops |
Three examples of riemannian line search and trust-region preview:
- PCA on Grassmann manifolds.
- Orthogonal weights on Stiefel manifolds.
- Covariance learning on SPD manifolds.
Two non-examples clarify the boundary:
- Euclidean gradient descent followed by arbitrary clipping.
- A projection step that destroys the first-order update direction.
Proof or verification habit for riemannian line search and trust-region preview:
Check tangent feasibility, descent direction under the metric, and retraction properties.
global object -> curved manifold or constraint set
local object -> chart, tangent space, or coordinate patch
linear operation -> derivative, gradient, velocity, Hessian approximation
geometric measure -> metric, length, distance, curvature
algorithmic move -> tangent step followed by geodesic or retraction
In AI systems, riemannian line search and trust-region preview matters because learned representations and constrained parameter spaces are rarely globally flat. A local linear approximation may be useful, but it must be attached to the point where it is valid.
This turns constraints such as orthogonality, low rank, and positive definiteness into native geometry instead of penalties.
Mini derivation lens.
- Choose a point on the manifold and name the local representation used near .
- Move the question into a chart, tangent space, or embedded constraint where first-order calculus is available.
- Compute the local object: derivative, tangent projection, metric-weighted gradient, path velocity, or retraction step.
- Translate the result back into coordinate-free language so the answer is not tied to one chart by accident.
- Check the invariant: the point remains on , the direction remains in , or the distance/gradient uses the stated metric.
Implementation lens.
A practical ML implementation should store both the ambient array representation and the geometric contract attached to it. For example, a normalized embedding is not just a vector; it is a point on a sphere. An orthogonal weight matrix is not just a matrix; it is a point on a Stiefel-type constraint. A covariance matrix is not just a symmetric array; it must stay positive definite.
The clean computational pattern is: encode the state, compute an ambient derivative if needed, convert it into a tangent or metric-aware object, take a small local step, and then return to the manifold with a geodesic formula or retraction. This is the same pattern used in the companion notebooks, just scaled down to visible two- and three-dimensional examples.
The important warning is that coordinate code can pass shape checks while still violating geometry. Differential geometry adds checks that are semantic: tangentness, smooth compatibility, metric choice, path validity, and constraint preservation.
Practical checklist:
- State the manifold and whether it is abstract, embedded, or quotient-like.
- State the local coordinates or tangent representation being used.
- Separate ambient vectors from tangent vectors.
- Name the metric before computing distances, angles, or gradients.
- Use geodesics or retractions when moving on the manifold.
- For ML claims, identify whether geometry is data geometry, parameter geometry, or statistical geometry.
Local diagnostic: Write gradient, tangent projection, retraction, and stopping criterion.
The companion notebook uses low-dimensional synthetic examples: circles, spheres, tangent projections, spherical interpolation, SPD matrices, and orthogonality constraints. These examples keep geometry visible while preserving the same update logic used in higher-dimensional ML systems.
| Compact ML phrase | Differential-geometric reading |
|---|---|
| local linearization | tangent-space approximation at a point |
| normalized embedding | point on a sphere with tangent constraints |
| natural gradient | Riemannian gradient under Fisher metric |
| orthogonal weights | point on a Stiefel-type manifold |
| latent interpolation | path that may need geodesic structure |
| covariance geometry | SPD manifold rather than arbitrary matrices |
A useful learning move is to compute everything first on a sphere. The sphere has visible curvature, simple tangent spaces, closed-form geodesics, and practical retractions. Once those are clear, Stiefel, Grassmann, SPD, and information-geometric examples become less mysterious.
For implementation, the main discipline is to avoid leaving the manifold silently. If a gradient step violates a constraint, either project the gradient into the tangent space before stepping or use a method whose update is intrinsic by design.
The final question for this subsection is whether a Euclidean formula is being used as an approximation, a coordinate expression, or a mistaken replacement for geometry. Differential geometry is the habit of telling those cases apart.