Skip to content
280

The Trillion Token Diet

2 min

0
Tokens in GPT-4 training data
0$+
Estimated training cost
0+
GPUs running for months

Training a large language model is one of the most resource-intensive computing tasks ever undertaken. GPT-4 was reportedly trained on over 13 trillion tokens, roughly equivalent to reading every book in the Library of Congress more than 100 times. The training run likely used tens of thousands of GPUs running continuously for months, consuming enough electricity to power a small city. The estimated cost exceeded $100 million. But raw data and compute are just the beginning. After pre-training comes instruction tuning with carefully curated examples, then RLHF where human evaluators rank the model's outputs to align it with human preferences. Each stage fundamentally transforms what the model can do.

What it takes to train a model on the internet's knowledge.

Stage 1 of 6