Lighthouse Attention: The Training-Time Hierarchy That Makes Quadratic Attention Practical Again
Lighthouse Attention is a training-only hierarchical attention mechanism that wraps standard SDPA without modification. Through symmetric Q/K/V pyramid pooling, parameter-free scoring, and a two-stage training recipe with dense recovery, it delivers 1.4–1.7× wall-clock speedup at 32K–128K context wh