Computational Geometrization
"Given Tri Dao, why not Quad Dao?"
Danielle Fong · March 2026 · Map Reduce for Thought
14proven
8conjectures
5refuted
12physicists
4AI models
151KBevidence
The 8 Shapes of Computation
Thurston proved — Perelman completed — that every possible 3D shape is built from
exactly 8 types of geometric clay. No more. No less. Glued together
along donut-shaped boundaries called tori.
Four of these clays let you spread out slowly (polynomial growth).
Four let you spread out fast (exponential growth).
If you're a colony of tiny computers living inside one of these shapes,
the geometry determines how much you can compute.
This is a theorem, not an analogy.
〰️
H²×R
saddle layers · er
PSPACE
♾️
SL̃(2,R)
Möbius flow · er
PSPACE
🍬
Sol
taffy pull · er (!)
???
Blue = polynomial growth = class P ·
Red = exponential growth = PSPACE ·
Gold = Sol (the genuine open problem)
The Socratic Argument
SOCRATES
When you tile FlashAttention into SRAM blocks — what do you throw away at each boundary?
DANIELLE
The attention scores. You keep (m, l) — the running max and log-sum-exp.
SOCRATES
And why does that work?
DANIELLE
Because log-sum-exp is associative. Split-and-merge without error.
SOCRATES
Could every computation be split this way without error?
DANIELLE
...no. Most operations aren't associative.
SOCRATES
So what's special about the ones that are?
FlashAttention = Topological Surgery
Flat piece (E³)
↔
SRAM tile
Full manifold (too big)
↔
Full attention matrix
Torus boundary (Z²)
↔
Tile boundary → (m, l)
Incompressible torus
↔
Can't eliminate this boundary
π₁ trivial inside each piece
↔
log-sum-exp is associative
JSJ = fewest possible tori
↔
Optimal tiling strategy
The Thermodynamic Tax
LANDAUER
Every bit you destroy costs kT ln 2 of heat. My theorem (1961). Verified experimentally (Bérut 2012).
LANDAUER
Each FlashAttention tile boundary destroys ~7 bits per query row. 256 scores compressed to 2 statistics. That's genuine information erasure.
SOCRATES
Does the canonical topological decomposition minimize this cost?
LANDAUER
Yes. Non-trivially. Incompressibility equals information-theoretic necessity. Every JSJ torus carries irreducible information. The canonical decomposition IS thermodynamically optimal. That surprised me.
The Punchline: Why 3?
2D
Genus classification
Too simple.
Nothing interesting happens.
3D
Thurston's 8 geometries
The sweet spot.
Rich enough for PSPACE.
Structured enough to classify.
4D
Markov 1958
Undecidable.
Can never find optimal cuts.
Uncountably many exotic structures.
Given Tri Dao, why not Quad Dao?
Because 3-manifold topology is the richest dimension where computation is still classifiable.
FlashAttention is the existence proof — it tiles attention along JSJ boundaries and achieves
exact results because the algebra is associative. The 8 Thurston geometries are the
periodic table of computational atoms. The torus boundaries are where you pay the
thermodynamic tax. The optimal decomposition — minimum tax, exact computation — is JSJ.
In 4D, none of this works. You can never classify the shapes. You can never find the optimal cuts.
π₁ is the Dao. The fundamental group — what doesn't change when everything else does.
The Way persists through all transformation.
Nominative determinism stays winning.