Marcus on AI
Some brief but important updates that very much support the themes of this newsletter:
- “Model and data size scaling are over.” Confirming the core of what I foresaw in “Deep Learning is Hitting a Wall” 3 years ago, Andrei Burkov wrote today on X, “If today’s disappointing release of Llama 4 tells us something, it’s that even 30 trillion training tokens and 2 trillion parameters don’t make your non-reasoning model better than smaller reasoning models. Model and data size scaling are over.”
- “occasional correct final answers provided by LLMs often result from pattern recognition or heuristic shortcuts rather than genuine mathematical reasoning”. A new study on math, supporting what Davis and I wrote yesterday re LLMs struggling with mathematical reasoning from Mahdavi et al, converges on similar conclusions, “Our study reveals that current LLMs fall significantly short of solving challenging Olympiad-level problems and frequently fail to distinguish correct mathematical reasoning from clearly flawed solutions. We also found that occasional correct final answers provided by LLMs often result from pattern recognition or heuristic shortcuts rather than genuine mathematical reasoning. These findings underscore the substantial gap between LLM performance and human expertise…”
- Generative AI may indeed be turning out to be a dud, financially. And the bubble might possibly finally be deflating. NVidia is down by a third, so far in 2025. (Far more than the stock market itself.) Meta’s woes with Llama 4 further confirm my March 2024 predictions that getting to a GPT-5 level would be hard, and that we would wind up with many companies with similar models, and essentially no moat, along with a price war, with profits modest at best. That is indeed exactly where we are.