Each new generation of large language model (LLM) consumes a staggering amount of resources.
Meta, for instance, trained its new Llama 3 models with about 10 times more data and 100 times more compute than Llama 2. Amid a chip shortage, it used two 24,000 GPU clusters, with each chip running around the price of a luxury car. It employed so much data in its AI work, it considered buying the publishing house Simon & Schuster to find more.
Afterward, even its executives wondered aloud if the pace was sustainable.
“It is unclear whether we need to continue scaling or whether we need more innovation on post-training,” Ahmad Al-Dahle, Meta’s VP of GenAI, told me in an interview last week.