Google AI

One of the peculiar narratives in modern technology is how Google, with its vast resources, elite talent pool, and decade-long head start in machine learning, somehow found itself playing catch-up in the generative AI revolution. It’s like watching the world’s greatest chess player lose to a novice because they were distracted reading a book about chess strategy. But it seems Google has finally looked up from its book.

Google announced Gemini 2.5 Pro, and by all accounts, it’s not just incrementally better than competing models—it’s leaping ahead. The model is reportedly outperforming competitors by more than 40 Elo points on the ChatBotArena benchmark, which in this world is like showing up to a knife fight with a tactical nuclear weapon.1

This isn’t just a small step forward in a gradual progression. This appears to be the second-largest jump in top model performance in the history of the LMSYS leaderboard, behind only when GPT-4 Turbo surpassed Claude 1. Of course, that earlier jump happened before companies really understood they were competing on benchmarks, back in the prehistoric days of 2023.

The Business of Benchmarks

It’s worth pausing to consider the strange economics of AI model benchmarks. Companies invest billions in training these models, and then we measure their success with leaderboards that somewhat resemble video game high-score tables. There’s something delightfully absurd about trillion-dollar companies competing for position on volunteer-run evaluation platforms.2

The reason we care about benchmarks is that they’re our best proxy for model capabilities. And capabilities drive adoption, which drives revenue, which drives more investment in capabilities. It’s the circle of AI life, playing out in quarterly increments of compute spend that would make a small nation’s GDP blush.

Google’s performance on reasoning-focused benchmarks is particularly noteworthy. They’re reporting a score of 18.8 on “Humanity’s Last Exam” without search or tools, which is remarkable considering that just months ago, OpenAI was touting its Deep Research as groundbreaking for being able to tackle this kind of complex reasoning—and that required web access.

The Multimodal Moat

Google isn’t just beating others on language benchmarks. They’re maintaining advantages in multimodal capabilities (including audio) and context length. This is the AI equivalent of not just having the fastest car but also the most comfortable seats and the best sound system.

The technical explanation from Google is predictably vague: “a significantly enhanced base model with improved post-training.” This is like a chef describing their award-winning dish as “better ingredients, better cooking.” Technically accurate but deliberately uninformative.3

The VC-to-Viability Pipeline

For AI startups caught in the middle of this frontier model arms race, the competitive landscape just got more challenging. As Google, OpenAI, and others push state-of-the-art performance further, the gap between what’s possible with proprietary models and what’s available to startups via open-source alternatives continues to widen.

This creates a strange dynamic where venture capital pours into AI startups that are fundamentally dependent on technology controlled by the very tech giants they’re competing against. It’s like opening a restaurant where you have to buy all your ingredients from a competitor’s grocery store.4

Pricing: The Great Unknown

The article notably mentions that until we have API pricing, it’s difficult to make informed guesses about whether Gemini 2.5 Pro is a massive model like GPT-4.5. This highlights one of the fundamental tensions in the AI market: the relationship between model size, performance, and cost.

If Google can deliver superior performance at a lower cost than competitors, they could disrupt the current pricing equilibrium. If, however, they charge similar prices to OpenAI, then competition shifts to other dimensions like reliability, integration, and ecosystem.

The Future of AI Thinking

Perhaps most revealing is Google’s statement that they’re “building these thinking capabilities directly into all of our models.” This suggests a strategic shift away from specialized reasoning models toward making advanced reasoning a standard feature across their AI portfolio.

The irony here is that Google, which pioneered large-scale neural networks and transformer architectures that made modern AI possible, is now playing catch-up by enhancing the very technology they helped create. It’s like watching the inventor of the automobile sprint to catch a bus.

Will Gemini 2.5 Pro be enough to reestablish Google’s AI leadership? The benchmarks suggest yes, but the market is more complicated. OpenAI has first-mover advantage, Microsoft has distribution, and Anthropic has that ineffable quality of being not-Google and not-OpenAI, which counts for something in enterprise sales.

But the biggest winners in this AI arms race might be the customers, who get increasingly capable models to play with while the tech giants compete for dominance. It’s like watching Godzilla fight King Kong while you collect the gold coins they shake loose during the battle.

  1. Elo ratings, originally developed for chess, measure relative skill levels. A difference of 40 points means the higher-rated model would be expected to win about 57% of the time, which in the near-perfect-information world of AI benchmarks is actually quite significant. 

  2. There’s probably a dissertation waiting to be written about how volunteer-run benchmarks have become the de facto standard for measuring progress in a trillion-dollar industry. 

  3. This vagueness is strategic. Google wants to telegraph its superiority without giving competitors a roadmap to replicate their approach. In the AI world, this is called “capabilities demonstration without capabilities transfer.” 

  4. For all the talk about AI democratization, the reality is that the most capable models remain controlled by a small handful of companies with the resources to train them. The open-source community continues making impressive progress, but the frontier keeps moving.