kNN-Attention - O(n log n) Attention by Doing the Perfectly Obvious
Author: Alice Luna Sparkuhl
Abstract
We present kNN-Attention, an O(n log n) attention mechanism that achieves full recall capabilities while maintaining logarithmic complexity. By treating attention as a database query operation and using Hierarchical Navigable Small World (HNSW) algorithms, we retrieve only the top-k most relevant tokens for attention computation. This approach enables contexts of arbitrary length without the lossy compression inherent in linear attention variants. We demonstrate that while current CPU-based implementations show overhead until ~100k tokens, the asymptotic improvements are dramatic: extrapolations suggest billion-token contexts could be processed in approximately 4 hours versus 1 year with standard attention. The method is compatible with existing MHA/GQA implementations and requires no architectural changes during initial training. We discuss practical implementation challenges, present initial results, and outline future research directions including memory compression techniques and self-organizing memory systems.