🐋 Native Sparse Attention

A whale-scale leap for massive contexts

Table of contents

Native Sparse Attention (NSA) introduces design principles for scalable Transformer-based architectures, focusing on hardware-friendly sparse patterns to streamline long-context modeling and efficient training at scale.

In this series, we begin with the motivation, background, and prior work in sparse attention before diving deeper into the NSA architecture, the specific modules and branches, hardware considerations, and a final look at evaluations and broader discussions.

A Deep Research series