Aiducation - Lesson

Training a large language model is one of the most resource-intensive computing tasks ever undertaken. GPT-4 was reportedly trained on over 13 trillion tokens, roughly equivalent to reading every book in the Library of Congress more than 100 times. The training run likely used tens of thousands of GPUs running continuously for months, consuming enough electricity to power a small city. The estimated cost exceeded $100 million. But raw data and compute are just the beginning. After pre-training comes instruction tuning with carefully curated examples, then RLHF where human evaluators rank the model's outputs to align it with human preferences. Each stage fundamentally transforms what the model can do.

The Trillion Token Diet