Attention Is All You Need
2 min
RNNs (Before 2017)
Process words one at a time, left to right. Long sequences lose early context. Training is slow and sequential.
Tap reveal to see the transformation
In 2017, a team of eight researchers, primarily from Google, published a paper with a bold title: "Attention Is All You Need." Before this, AI language models relied on recurrent neural networks that processed words one at a time, like reading through a keyhole. The transformer architecture they proposed could look at an entire sentence at once, understanding how every word relates to every other word simultaneously. The result was dramatic: models trained faster, performed better, and scaled to sizes previously thought impractical. Within a few years, every major language model (GPT, BERT, Claude, Gemini) was built on this architecture. A single paper didn't just advance the field; it redefined it entirely.
The paper that changed everything, in plain English.