Is AI Really Eating the World?

In August 2011, Marc Andreessen wrote “Why Software Is Eating the World”, an essay about how software was transforming industries, disrupting traditional businesses, and revolutionizing the global economy. Recently, Benedict Evans, a former a16z partner, gave a presentation on generative AI three years after ChatGPT’s launch. His argument in short:

we know this matters, but we don’t know how.

In this article I will try to explain why I find his framing fascinating but incomplete. Evans structures technology history in cycles. Every 10-15 years, the industry reorganizes around a new platform: mainframes (1960s-70s), PCs (1980s), web (1990s), smartphones (2000s-2010s). Each shift pulls all innovation, investment, and company creation into its orbit. Generative AI appears to be the next platform shift, or it could break the cycle entirely. The range of outcomes spans from “just more software” to a single unified intelligence that handles everything. The pattern recognition is smart, but I think the current evidence points more clearly toward commoditization than Evans suggests, with value flowing up the stack rather than to model providers.

The hyperscalers are spending historic amounts. In 2025, Microsoft, Google, Amazon, and Meta will invest roughly $400 billion in AI infrastructure, more than global telecommunications capex. Microsoft now spends over 30% of revenue on capex, double what Verizon spends. What has this produced? Models that are simultaneously more capable and less defensible. When ChatGPT launched in November 2022, OpenAI had a massive quality advantage. Today, dozens of models cluster around similar performance. DeepSeek proved that anyone with $500 million can build a frontier model. Costs have collapsed. OpenAI’s API pricing has dropped by 97% since GPT-3’s launch, and every year brings an order of magnitude decline in the price of a given output.

Now, $500 million is still an enormous barrier. Only a few dozen entities globally can deploy that capital with acceptable risk. GPT-4’s performance on complex reasoning tasks, Claude’s extended context windows of up to 200,000 tokens, Gemini’s multimodal capabilities, these represent genuine breakthroughs. But the economic moat isn’t obvious to me (yet).

Evans uses an extended metaphor: automation that works disappears. In the 1950s, automatic elevators were AI. Today they’re just elevators. As Larry Tesler noted in 1970,

AI is whatever machines can’t do yet. Once it works, it’s just software.

The question: will LLMs follow this pattern, or is this different?

Current deployment shows clear winners but also real constraints. Software development has seen massive adoption, with GitHub reporting that 92% of developers now use AI coding tools. Marketing has found immediate uses generating ad assets at scale. Customer support has attracted investment, though with the caveat that LLMs produce plausible answers, not necessarily correct ones. Beyond these areas, adoption looks scattered. Deloitte surveys from June 2025 show that roughly 20% of U.S. consumers use generative AI chatbots daily, with another 34% using them weekly or monthly. Enterprise deployment is further behind. McKinsey data shows most AI “agents” remain in pilot or experimental stages. A quarter of CIOs have launched something. Forty percent don’t expect production deployment until 2026 or later.

But I think here’s where Evans’ “we don’t know” approach misses something important. Consulting firms are booking billions in AI contracts right now. Accenture alone expects $3 billion in GenAI bookings for fiscal 2025. The revenue isn’t coming from the models. It’s coming from integration projects, change management, and process redesign. The pitch is simple: your competitors are moving on this, you can’t afford to wait. If your competitors are investing and you’re not, you risk being left behind. If everyone invests and AI delivers modest gains, you’ve maintained relative position. If everyone invests and AI delivers nothing, you’ve wasted money but haven’t lost competitive ground. Evans notes that cloud adoption took 20 years to reach 30% of enterprise workloads and is still growing. New technology always takes longer than advocates expect. His most useful analogy is spreadsheets. VisiCalc in the late 1970s transformed accounting. If you were an accountant, you had to have it. If you were a lawyer, you thought “that’s nice for my accountant.” ChatGPT today has the same dynamic. Certain people with certain jobs find it immediately essential. Everyone else sees a demo and doesn’t know what to do with the blank prompt. This is right, and it suggests we’re early. But it doesn’t tell us where value will accumulate.

The standard pattern for deploying technology goes in stages: (1) Absorb it (make it a feature, automate obvious tasks). (2) Innovate (create new products, unbundle incumbents). (3) Disrupt (redefine what the market is). We’re mostly in stage one. Stage two is happening in pockets. Y Combinator’s recent batches are overwhelmingly AI-focused, betting on thousands of new companies unbundling existing software (startups are attacking specific enterprise problems like converting COBOL to Java or reconfiguring telco billing systems). Stage three remains speculative. From an economic perspective, there’s the automation question: do you do the same work with fewer people, or more work with the same people? This echoes debates about labor-augmenting technical change in economics. Companies whose competitive advantage was “we can afford to hire enough people to do this” face real pressure. Companies whose advantage was unique data, customer relationships, or distribution may get stronger. This is standard economic analysis of labor-augmenting technical change, and it probably holds here too.

All current recommendation systems work by capturing and analyzing user behavior at scale. Netflix needs millions of users watching millions of hours to train its recommendation algorithm. Amazon needs billions of purchases. The network effect comes from data scale. What if LLMs can bypass this? What if an LLM can provide useful recommendations by reasoning about conceptual relationships rather than requiring massive behavioral datasets? If I ask for “books like Pirsig’s Zen and the Art of Motorcycle Maintenance but more focused on Eastern philosophy,” a sufficiently capable LLM might answer well without needing to observe 100 million readers. It understands (or appears to understand) the conceptual space. I’m uncertain whether LLMs can do this reliably by the end of 2025. The fundamental question is whether they reason or pattern-match at a very sophisticated level. Recent research suggests LLMs may rely more on statistical correlations than true reasoning. If it’s mostly pattern-matching, they still need the massive datasets and we’re back to conventional network effects. If they can actually reason over conceptual spaces, that’s different. That would unbundle data network effects from recommendation quality. Recommendation quality would depend on model capability, not data scale. And if model capability is commoditizing, then the value in recommendations flows to whoever owns customer relationships and distribution, not to whoever has the most data or the best model. I lean toward thinking LLMs are sophisticated pattern-matchers rather than reasoners, which means traditional network effects still apply. But this is one area where I’m genuinely waiting to see more evidence.

Now, on AGI. The Silicon Valley consensus, articulated by Sutskever, Altman, Musk, and others, is that we’re on a clear path to artificial general intelligence in the next few years, possibly by 2027 or 2028. The argument goes: scaling laws continue to hold, we’re seeing emergent capabilities at each scale jump, and there’s no obvious wall before we reach human-level performance across all cognitive domains. I remain unconvinced. Not because I think AGI is impossible, but because the path from “really good at pattern completion and probabilistic next-token prediction” to “general reasoning and planning capabilities” seems less straightforward than the AI CEOs suggest. Current LLMs still fail in characteristic ways on tasks requiring actual causal reasoning, spatial reasoning, or planning over extended horizons. They’re getting better, but the improvement curve on these specific capabilities looks different from the improvement curve on language modeling perplexity. That suggests to me that we might need architectural innovations beyond just scaling, and those are harder to predict.

But let’s say I’m wrong. Let’s say AGI arrives by 2028. Even then, I find it hard to model why this would be tremendously economically beneficial specifically to the companies that control the models. Here’s why: we already have multiple competing frontier models (ChatGPT, Claude, Gemini, Microsoft’s offerings, and now DeepSeek). If AGI arrives, it likely arrives for multiple players at roughly the same time, given how quickly capabilities diffuse in this space. Multiple competing AGIs means price competition. Price competition in a product with near-zero marginal cost means prices collapse toward marginal cost. Where does economic value flow in that scenario? It flows to the users of AI, not the providers. Engineering firms using AGI for materials development capture value through better materials. Pharmaceutical companies using AGI for drug discovery capture value through better drugs. Retailers using AGI for inventory management capture value through better margins. The AGI providers compete with each other to offer the capability at the lowest price. This is basic microeconomics. You capture value when you have market power, either through monopoly, through differentiation, or through control of a scarce input. If models are commodities or near-commodities, model providers have none of these.

The counterargument is that one provider achieves escape velocity and reaches AGI first with enough of a lead that they establish dominance before others catch up. This is the OpenAI/Microsoft theory of the case. Maybe. But the evidence so far suggests capability leads are measured in months, not years. GPT-4 launched in March 2023 with a substantial lead. Within six months, Claude 2 was comparable. Within a year, multiple models clustered around similar capability. The diffusion is fast. Another counterargument is vertical integration. Maybe the hyperscalers that control cloud infrastructure plus model development plus customer relationships plus application distribution can capture value even if models themselves commoditize. This is more plausible, essentially the AWS playbook. Amazon didn’t make money by having the best database. They made money by owning the infrastructure, the customer relationships, and the entire stack from hardware to application platform. Microsoft is clearly pursuing this strategy with Azure plus OpenAI plus Copilot plus Office integration. Google has Search plus Cloud plus Gemini plus Workspace. This could work, but it’s a different thesis than “we have the best model.” It’s “we control the distribution and can bundle.”

Evans shows a scatter plot (Slide 34) of model benchmark scores from standard evaluations like MMLU and HumanEval. Leaders change weekly. The gaps are small. Meanwhile, consumer awareness doesn’t track model quality. ChatGPT dominates with over 700 million weekly active users not because it has the best model anymore, but because it got there first and built brand. If models are commodities, value moves up the stack to product design, distribution, vertical integration, and customer relationships. This is exactly what happened with databases. Oracle didn’t win because they had the best database engine. They won through enterprise sales, support contracts, and ecosystem lock-in. Microsoft didn’t beat them with a better database. They won by bundling SQL Server with Windows Server and offering acceptable performance at a lower price. The SaaS pattern suggests something similar happens here. The model becomes an input. The applications built on top, the customer relationships, the distribution, those become the valuable assets. Why do I think this pattern applies rather than, say, the search pattern where Google maintained dominance despite no fundamental technical moat? Two reasons: (1) Search had massive data network effects. Every search improved the algorithm, and Google’s scale meant they improved faster. LLMs have weaker data network effects because the pretraining data is largely static and publicly available, and fine-tuning data requirements are smaller. (2) Search had winner-take-all dynamics through defaults and single-answer demand. You pick one search engine and use it for everything. AI applications look more diverse. You might use different models for different tasks, or your applications might switch between models transparently based on price and performance. The switching costs are lower.

So where does this leave us? The technology exists and the underlying capabilities are real. But I think the current evidence points toward a world where value flows to applications and customer relationships, and where the $400 billion the hyperscalers are spending buys them competitive positioning rather than monopoly. The integrators are making money now by helping enterprises navigate uncertainty. Some of that will produce real productivity gains. Much of it is expensive signaling and competitive positioning. The startups unbundling existing software will see mixed results, the ones that succeed will do so by owning distribution or solving really specific problems where switching costs are high, not by having better access to AI. The biggest uncertainty is whether the hyperscalers can use vertical integration to capture value anyway, or whether the applications layer fragments and value flows to thousands of specialized companies. That depends less on AI capabilities and more on competitive dynamics, regulation, and whether enterprises prefer integrated platforms or best-of-breed solutions. My guess is we end up somewhere in between. The hyperscalers maintain strong positions through bundling and infrastructure control. A long tail of specialized applications captures value in specific verticals. The model providers themselves, unless they’re also infrastructure providers, struggle to capture value proportional to the capability they’re creating. But I’m genuinely uncertain, and that uncertainty is where the interesting bets are.

What makes Evans’ presentation valuable is precisely what frustrated me about it initially: his refusal to collapse uncertainty prematurely. I’ve spent this entire post arguing for a specific view of how value will flow in AI markets, but Evans is right that we’re pattern-matching from incomplete data. Every previous platform shift looked obvious in retrospect and uncertain in real time. The PC revolution, the internet boom, mobile, they all had credible skeptics who turned out wrong and credible bulls who were right for the wrong reasons. Evans’ discipline in laying out the full range of possibilities, from commodity to monopoly to something entirely new, is the intellectually honest position. I’ve made specific bets here because that’s useful for readers trying to navigate the space, but I’m more confident in my framework than in my conclusions. His presentation remains the best map of the territory. Go watch it, even if you end up disagreeing with how much certainty is warranted.

Original presentation linked in this post’s title.