# Philipp D. Dubach - Full Content Index > This is the extended version of llms.txt with full article summaries inline. For the compact version, see [llms.txt](https://philippdubach.com/llms.txt). Independent researcher and strategy consultant specializing in quantitative finance, AI infrastructure economics, and macroeconomic analysis. - Last Updated: 2026-05-18 - Total Articles: 80 - Site: https://philippdubach.com/ ## AI (37 articles) ### [Reconciling Enterprise AI Revenue](https://philippdubach.com/posts/reconciling-enterprise-ai-revenue/) Published: 2026-05-17 Description: Four enterprise AI revenue figures span a 40x range. The $63.2B audit-grade floor is the only tier that defensibly underwrites $690B of hyperscaler capex. Summary: × Companion to the full research report, Reconciling Enterprise AI Revenue: A Methodological Crosswalk and Vendor-Level Census, 2025. The PDF carries the 68-vendor primary-source census, the six-tier disclosure framework, the per-step sourced deductions, and the netting of structural double-counts. Key findings: - Four widely-cited 2025 enterprise AI revenue figures span 40x ($37B Menlo, $100-135B vendor run-rate sum, $307B IDC, $1.478T Gartner); each is correct under its own perimeter, so the reconciliation is definitional rather than measurement error. - The audit-grade floor that defensibly underwrites $690B of 2026 hyperscaler capex is $63.2B narrow or $72.5B broad; looser numbers mix channel markup, third-party ARR claims, or full-retail device value into the same denominator. - Capex-to-revenue runs at 10.9x on the narrow audit-grade basis against the 1990s telecom peak of 3.5x; even after netting only AI-incremental capex, the ratio sits at 6.3-7.9x, worse than the closest historical analogue. - The Spread Index (audit-grade revenue over Gartner umbrella) opens at 4.28% narrow / 4.90% broad in May 2026; if it stays below 5% through Q4 2026, capex coverage cannot improve from disclosure alone and revenue itself must compound. ### [What Claude Thinks But Doesn't Say](https://philippdubach.com/posts/what-claude-thinks-but-doesnt-say/) Published: 2026-05-11 Description: Anthropic's natural language autoencoders translate Claude's activations into readable text. The method works. The press release skips three structural problems. Summary: × Anthropic published a method for translating Claude's internal activations into readable English. It works well enough to ship. The press release skips three structural problems that the paper itself acknowledges. Key findings: - Natural language autoencoders train two model copies to round-trip activations through plain text; the only loss term is reconstruction accuracy, but the resulting text reveals evaluation awareness in 26% of SWE-bench transcripts versus under 1% of real claude.ai traffic. - The 'internal monologue' voice is partly inherited: warm-start data is generated by prompting Opus 4.5 to imagine what a model's thoughts would sound like, then locked in by a KL penalty through training. - The 12-15% auditing-game win rate beats baselines by 4-5x, but only when training data is withheld; with pretraining-corpus access, plain keyword search finds the same misalignment in seconds. - The technique stays honest only as long as the target model is frozen; using NLA-readable activations as an RL reward would re-introduce Goodhart and erase the property that makes the method work. - For enterprise model buyers and audit teams, NLAs raise the floor on vendor diligence: decode-at-position playbooks, layer-justification on the audit data, and disclosure of whether NLA-readable activations entered the training loop. ### [Two Anthropics](https://philippdubach.com/posts/two-anthropics/) Published: 2026-05-09 Description: Two Anthropics: the safety lab Dario founded in 2021 and the $380B frontier lab it became. Same organism, two narratives the company itself has to reconcile. Three scenarios for how the tension resolves. Summary: × Anthropic was founded to be the safety lab that would pull rivals upward. Five years later it is one of the most aggressive frontier scalers at $380 billion — and the company whose own founding thesis treats frontier capability at this scale as the thing most likely to require the safeguards the company says it builds. Key findings: - Anthropic ran from a $124M Series A to a $380B valuation in five years on $10B revenue; the safety lab and the frontier lab are now the same organism. - The November 2023 refusal of OpenAI's CEO offer signaled the safety thesis was real, but three years of 10x revenue growth made it a different company than the one that refused. - Three forcing functions already show the paradox is binding: a March 2026 federal injunction in the DoD case, the Pottinger WSJ chip-controls op-ed, and the August 2025 Nvidia feud. - The allocator question is whether safety-narrative is moat or constraint at frontier scale; the rate of safety-practice diffusion versus capability spread is the signal that decides. ### [Karpathy's Software 3.0 Playbook](https://philippdubach.com/posts/karpathys-software-3.0-playbook/) Published: 2026-05-01 Description: Twelve lessons from Andrej Karpathy's Sequoia interview: Software 3.0, vibe coding versus agentic engineering, jagged intelligence, and why December 2024 was the inflection most people missed. Summary: × Andrej Karpathy is one of the few people who has both built modern AI and explained it for the rest of us. He co-founded OpenAI, ran computer vision at Tesla (where he got Autopilot working), and his courses on neural networks are some of the most-watched lectures on the internet. He also has a habit of naming the era we're already in. "Vibe coding" was his. "Software 3.0" looks like the next one. Key findings: - Karpathy marks December 2024 as the inflection where agentic coding crossed from babysitting to trust, invisible to anyone whose mental model is still anchored to ChatGPT. - The GPT-3.5 to GPT-4 chess jump shows capability tracks whatever frontier labs feed into reinforcement learning, so verifiable domains automate first regardless of economic value. - In Software 3.0 the unit of programming shifts from a function to a paragraph, the context window is the program, and the LLM is the interpreter. - Vibe coding raises the floor for non-engineers while agentic engineering raises the ceiling for professionals well past the old 10x benchmark. ### [F3ED Can't Call an Ace: Fixing a NeurIPS 2024 Tennis Model](https://philippdubach.com/posts/f3ed-cant-call-an-ace-fixing-a-neurips-2024-tennis-model/) Published: 2026-04-29 Description: F3ED, the NeurIPS 2024 tennis shot detector, mislabels 73% of single-shot serve unforced errors. A 23-line scoreboard OCR reconciler fixes them. Summary: I built a tennis broadcast pipeline this spring and ended up running F3ED, the NeurIPS 2024 shot detector, on a couple of ATP Challenger matches. F3ED is a good model. It also kept labeling clear aces as "unforced errors", which is what this post is about. Code: github.com/philippdubach/tennis-vision. F3ED (NeurIPS 2024) detects shots well. The catch is the outcome head, which has 4 classes: in, winner, forced-err, unforced-err. There's no class for ace, double_fault, or first_serve_fault. Those events aren't shot properties; they're score-grammar, and they need state from outside the shot itself. Key findings: - F3ED labeled 11 single-shot serve rallies as unforced errors but only 3 were genuine, 7 were first-serve faults and 1 was an ace, so 73% are mislabeled by tennis's own definition - A 23-line reconciler that reads the scoreboard before and after each rally overrides F3ED's outcome label in microseconds, running once per rally against a 1 Hz OCR state timeline - Swapping YOLOv8m for YOLOv8x lifted top-player pose coverage from 70.0% to 97.6% at 1080p but did nothing at 720p, where the camera-far player drops below COCO's small-object scale buckets - Two filters cut CatBoost bounce false positives by 21-27%: a 400 ms temporal dedup and a court-locality check that drops bounces landing more than 200 px past the doubles alley ### [Inside PRAGMA: Revolut's Foundation Model for Banking](https://philippdubach.com/posts/inside-pragma-revoluts-foundation-model-for-banking/) Published: 2026-04-26 Description: Revolut's PRAGMA is a 1B-parameter encoder trained on 24B banking events. Reading the paper, comparing with Nubank's nuFormer, planning a rebuild. Summary: × This month, Revolut Research and NVIDIA published PRAGMA: an encoder-only transformer trained on 26 million user histories spanning 24 billion events and 207 billion tokens across 111 countries. To my knowledge it is the largest encoder backbone for consumer banking event data anyone has put on arXiv. Nine months earlier, Nubank had published nuFormer, a similar premise with the opposite architecture. Can you train a transformer on raw transaction ledgers and replace the gradient-boosted-tree models running production credit, fraud, and recommendation pipelines. Key findings: - Revolut's PRAGMA scales from 10M to 1B parameters, pretrained on 24B events from 26M users across 111 countries. - PRAGMA delivers +130% PR-AUC on credit scoring and +163% AUUC on uplift via LoRA fine-tuning of a shared backbone, per the figures reported in the paper. - The paper itself reports a 47.1% drop on the F-0.5 anti-money-laundering downstream task, attributed by the authors to PRAGMA processing user histories in isolation rather than across users; this is a property of the research backbone, not a statement about Revolut's production AML stack. - Nubank's nuFormer reaches a similar conclusion via a different architecture, suggesting the field is converging. ### [Do Not Disturb My Circles](https://philippdubach.com/posts/do-not-disturb-my-circles/) Published: 2026-04-13 Description: AlphaFold cost under $1M to train. OpenAI spends $2.3B on inference. The chatbot era consumed the talent and compute that could have cured diseases. Summary: × If I'd had my way, we would have left it in the lab for longer and done more things like AlphaFold, maybe cured cancer or something like that. Key findings: - AlphaFold 2 trained on 128 TPUs for 11 days at an estimated cost under $1 million, less than what OpenAI spends on inference in a single day of its $2.3 billion annual bill - Big Tech spends 75x more on AI than the entire US federal science budget: $250 billion on chatbot infrastructure versus $3.3 billion on AI for scientific research - Bell Labs produced 10 Nobel Prizes under AT&T's monopoly protection and one after the breakup introduced commercial pressure, the same dynamic Hassabis warned about - Hassabis told The Guardian he would have 'left AI in the lab for longer, done more things like AlphaFold, maybe cured cancer,' then ChatGPT forced him into the commercial race ### [On-Device AI Models Will Be The New Reason to Upgrade Your Phone](https://philippdubach.com/posts/on-device-ai-models-will-be-the-new-reason-to-upgrade-your-phone/) Published: 2026-03-25 Description: Smartphones haven't had a compelling upgrade story in years. On-device AI models, distilled from frontier systems like Gemini, are about to change that. Parameters are the new megapixels. Summary: × The iPhone 17 runs a 3 billion parameter language model on-device at 30 tokens per second. Obviously, the average consumer has no idea what that sentence means, and Apple hasn't figured out how to make them care. Key findings: - The global smartphone replacement cycle has stretched to 3.5 years because cameras, screens, and processors stopped providing meaningful generational differences. - Apple's 3 billion parameter on-device Foundation Model runs at 30 tokens per second on an iPhone 15 Pro, but distilling from Google's full Gemini could push future on-device models far beyond that ceiling. - Gartner projects GenAI smartphone spending will hit $393 billion in 2026, a 32% jump from 2025, with nearly 100% of premium devices featuring GenAI capabilities by 2029. - Parameter counts risk becoming the next megapixel myth, a single number that marketing departments can inflate while actual on-device experience depends on quantization, distillation quality, and NPU architecture. ### [AI Can Now Design Drugs in Seconds; We Still Can't Tell You If They Work.](https://philippdubach.com/posts/ai-can-now-design-drugs-in-seconds-we-still-cant-tell-you-if-they-work./) Published: 2026-03-18 Description: IsoDDE doubles AlphaFold 3 on hard benchmarks and beats physics-based gold standards. But no AI drug has FDA approval. What $4B in pharma deals actually mean. Summary: No AI-discovered drug has ever received FDA approval. That sentence should sit uncomfortably next to every headline about Alphabet's drug discovery spinoff. On February 10, Isomorphic Labs, the Google DeepMind spinoff focused on computational drug design, released IsoDDE: its Drug Design Engine. This isn't a model or an AlphaFold upgrade. IsoDDE is a unified in silico drug discovery system that runs protein structure prediction, ligand binding, affinity estimation, and pocket identification in concert, generating in seconds what used to take days of physics-based simulation. On the hardest molecular prediction tasks, the "Runs N' Poses" benchmark designed to test generalization to unfamiliar proteins, IsoDDE hits a 50% success rate. AlphaFold 3 manages roughly 23%. On antibody-antigen modeling, IsoDDE beats AlphaFold 3 by 2.3× and the open-source Boltz-2 by 19.8×. On binding affinity prediction, it achieves a Pearson correlation of 0.85, beating the physics-based gold standard FEP+ at 0.78. × I would assume that these are large enough improvements that the computational bottleneck in drug design may no longer be the binding question. Key findings: - IsoDDE hits 50% on the hardest protein-ligand prediction benchmark versus 23% for AlphaFold 3 and beats the physics-based gold standard FEP+ on binding affinity with a 0.85 Pearson correlation - AI-discovered drugs show 80-90% Phase I success rates versus a 40-65% historical average, but Phase II efficacy rates remain roughly 40% for both AI and traditional drugs - Isomorphic's pharma deals total over $4 billion in headline value but only $82.5 million in upfront cash, a 50:1 ratio that reflects how much pharma is betting on contingent outcomes - No AI-discovered drug has received FDA approval as of February 2026, and Isomorphic targets its first clinical candidates for late 2026 ### [The Last Architecture Designed by Hand](https://philippdubach.com/posts/the-last-architecture-designed-by-hand/) Published: 2026-03-16 Description: The transformer's limits are now mathematical proofs, not empirical hunches. Hybrids are in production. AI is searching for its own replacement. Here's what comes after. Summary: I bet there is another new architecture to find that is gonna be as big of a gain as transformers were over LSTMs. Sam Altman, the CEO of the company most invested in the transformer is telling a room of students it isn't the final form. So what comes after the transformer? He's probably right that something will, and the evidence is no longer anecdotal. Several recent papers have proved that the transformer's worst properties are structural, not engineering problems to be fixed with better data or more compute, but mathematical lower bounds. Key findings: - Mathematical proofs now show that quadratic scaling, hallucination, and positional bias are structural properties of the transformer, not fixable with better training data or RLHF. - Over 60% of frontier models released in 2025 use Mixture of Experts, and production hybrids like Jamba and Qwen3-Next blend attention with state space models at 3x throughput. - AlphaEvolve found a 23% speedup inside Gemini's own architecture, cutting training time by 1% and recovering 0.7% of Google's total compute resources. - OpenAI's inference spending hit $2.3 billion in 2024, 15x what they spent training GPT-4.5, meaning the economic center of gravity has already shifted from training to inference. ### [MCP vs A2A in 2026: How the AI Protocol War Ends](https://philippdubach.com/posts/mcp-vs-a2a-in-2026-how-the-ai-protocol-war-ends/) Published: 2026-03-15 Description: MCP leads with 97M monthly SDK downloads and 10,000+ servers. A2A fills a different layer. Analysis of the agentic AI standards war with historical parallels. Summary: On March 26, 2025, Sam Altman posted the following three sentences people love MCP and we are excited to add support across our products. MCP is Anthropic's Model Context Protocol. OpenAI is Anthropic's most direct competitor. Altman was endorsing a rival's standard. That post may be the most significant event in enterprise AI infrastructure this year. When your main competitor adopts your protocol, the war is close to over. I've been watching this play out since Anthropic launched MCP in November 2024, and I want to work through what's happening: who controls what, what "interoperability" means in practice, and whether any of this follows patterns we've seen before. Key findings: - MCP reached 10,000+ servers and 97 million monthly SDK downloads before A2A launched, compounding a five-month head start into a structural ecosystem lead. - OpenAI adopting MCP in March 2025 mirrors the iMac's USB-only bet in 1998: one player so central to the ecosystem that their adoption made the standard inescapable. - The agentic AI market is $7-8 billion in 2025, with analyst projections ranging from $50 billion to $199 billion by 2034 at 40-50% annual growth. - 53% of MCP servers still rely on static credentials rather than OAuth, and a critical npm package vulnerability (CVE-2025-6514) exposed 437,000+ installations to shell injection. ### [AI Models Are the New Rebar](https://philippdubach.com/posts/ai-models-are-the-new-rebar/) Published: 2026-03-11 Description: Qwen 3.5-35B runs on a gaming PC and matches Claude Sonnet 4.5. When the commodity version is 95% as good and 97% cheaper, you have a pricing problem. Summary: Qwen 3.5-35B-A3B, a model released by Alibaba in February 2026, runs on a single consumer GPU with 24 gigabytes of VRAM. A secondhand RTX 4090, available for around $2,000, generates 60 to 100 tokens per second with it. On select benchmarks per Alibaba's own evaluations, it matches or beats Claude Sonnet 4.5. The Qwen 3.5 Flash tier costs $0.10 per million input tokens through Alibaba's API. Claude Sonnet 4.5 costs $3.00. That's a 97 percent discount. For comparable performance. Key findings: - Qwen 3.5-35B matches Claude Sonnet 4.5 on select benchmarks at $0.10 per million input tokens versus $3.00, a 97 percent cost gap for comparable performance. - The performance gap between open-source and proprietary AI models shrank from 8 percent to 1.7 percent in a single year, per the Stanford HAI 2025 AI Index. - Reported figures on OpenAI's 2025 margins and losses (adjusted gross margin compression and a roughly $13.5 billion H1 2025 loss) have circulated in trade press; the underlying numbers should be treated as press-sourced rather than primary disclosure until OpenAI publishes audited financials. - AI inference prices decline at a median rate of 50x per year for equivalent performance, according to Epoch AI, a pace that dwarfs Moore's Law. ### [AI Capex Arms Race: Who Blinks First?](https://philippdubach.com/posts/ai-capex-arms-race-who-blinks-first/) Published: 2026-03-08 Description: Alphabet's free cash flow is on track to fall 90% in 2026. Amazon's is at $11B. $690B in AI capex is cannibalizing the cash that justified these valuations. Summary: Alphabet's free cash flow is projected to fall roughly 90% in 2026. Not because the business is in trouble. Because the company has committed to spending $83–93 billion more on capital expenditure than it did last year. That is what $660–690 billion in AI capex looks like up close. Amazon guided to $200 billion alone. Meta's long-term debt more than doubled to $58.7 billion to help finance its share. Goldman Sachs projects cumulative 2025–2027 spending across the Big 4 at $1.15 trillion, more than double the $477 billion spent over the prior three years combined. BofA credit strategists found this will consume 94% of operating cash flow minus dividends and buybacks. Key findings: - The Big 4 hyperscalers are on track to spend $610–665 billion in 2026, roughly 70% above 2025 levels, with Goldman Sachs projecting cumulative 2025–2027 spend at $1.15 trillion - Alphabet's free cash flow may fall from $73 billion to roughly $8 billion in 2026 as capex doubles; Amazon's is already compressed to $11 billion TTM with $200B guidance ahead - Direct AI revenue covers roughly 15% of AI-specific capex: Sequoia's David Cahn calculated the ecosystem needs $600 billion in annual revenue to justify current infrastructure spending, against the roughly $50–100 billion it actually generates - Inference costs are falling 50–200x per year (Epoch AI), meaning existing GPU infrastructure may become stranded faster than depreciation schedules assume ### [The Physics Department That Slowed Down](https://philippdubach.com/posts/peter-thiels-physics-department/) Published: 2026-03-02 Description: Peter Thiel says physics stalled in 1972. Then GPT-5.2 proved a new result in theoretical physics. The 75:1 AI compute gap between commerce and science. Summary: On December 11, Jimmy Carr sat on the TRIGGERnometry podcast and delivered a riff that sounded like Peter Thiel's stagnation thesis filtered through a comedian's timing: Minus the screens from any room, we're living in the 1970s. Nothing's happened in physics since '72. String theory has not got us anywhere. But if you take the compute power of AI and point it at physics, what happens? We could have a world of plenty. I hope that's the world we live in. But it could go another way. Key findings: - Total factor productivity growth fell from 1.7% annually (1947-1973) to 0.4% since 2004, a 76% decline that underpins Thiel's stagnation thesis - Big Tech spends 75x more on AI than the entire US federal science budget: $250 billion versus $3.3 billion per year - GPT-5.2 derived a new result in theoretical physics on February 13, 2026, overturning a decades-old assumption about gluon scattering amplitudes - AI progressed from Olympiad geometry to IMO gold to a theoretical physics proof in 25 months, all on less than 1.3% of commercial AI compute ### [Every Bulge Bracket Bank Agrees on AI](https://philippdubach.com/posts/every-bulge-bracket-bank-agrees-on-ai/) Published: 2026-03-01 Description: I read 12 AI research reports from Goldman Sachs, JPMorgan, UBS, and 6 other banks. Here's the consensus they're pushing, and what they're not saying. Summary: × I spent the last week reading 12 bank AI research reports from nine of the world's largest financial institutions: Goldman Sachs, JPMorgan, Morgan Stanley (three separate reports), UBS, Barclays, Bank of America, HSBC, Citi, Deutsche Bank, and Santander. I wanted to understand how institutions that collectively manage trillions of dollars and employ thousands of analysts actually see this technology heading into 2026: where they agree, where they diverge, and what they're being less than forthcoming about. Key findings: - Not a single report from any of the nine institutions recommends reducing AI exposure. The absence of a bearish voice is itself the most important signal in the entire collection - The macro productivity estimates span from +0.7% to +15% TFP over ten years, using the same underlying academic papers, cherry-picked to support nine different commercial narratives - Only ~10% of US companies are productively using AI and 42% have abandoned GenAI projects. The gap between capex commitment and actual adoption is the most underweighted risk in the consensus - AI capex already contributed 1.4–1.5 percentage points to US GDP growth in H1 2025, making infrastructure spending the dominant driver of US economic expansion in that period - Morgan Stanley's historical data shows second-order beneficiaries outperform first-order enablers by 10–100x over long horizons, yet nearly every bank's current positioning favours first-order plays anyway ### [When AI Labs Become Defense Contractors](https://philippdubach.com/posts/when-ai-labs-become-defense-contractors/) Published: 2026-03-01 Description: The Anthropic-Pentagon standoff isn't an ethics story. It's a replay of the 1993 Last Supper that consolidated 51 defense primes into 5, at Silicon Valley speed. Summary: Lockheed started by building Amelia Earhart's favorite plane. Then came a government loan guarantee in 1971 (the L-1011 TriStar nearly killed the company), a Cold War, decades of consolidation, and now a business that earns 92.5% of its revenue from government contracts, with the F-35 alone accounting for 26% of its $71 billion in annual sales. The process took about 50 years. AI labs becoming defense contractors will happen faster. On February 27, 2026, two things happened within hours of each other. President Trump ordered every federal agency to "IMMEDIATELY CEASE all use of Anthropic's technology" after CEO Dario Amodei refused to strip safety constraints from Claude's Pentagon deployment, specifically prohibitions on mass domestic surveillance and fully autonomous weapons. Defense Secretary Pete Hegseth then labeled Anthropic a "Supply-Chain Risk to National Security," a designation previously reserved for foreign adversaries like Huawei, never before applied to an American company. That evening, Sam Altman announced that OpenAI had signed a deal to deploy its models on the Pentagon's classified network, posting that the Department of War "displayed a deep respect for safety." (Whether that reflects the Pentagon's actual position or Altman's political optimism, remains unclear for now.) Key findings: - The FY2026 Pentagon AI budget jumped to $13.4 billion from $1.8 billion, a 7x increase in a single budget cycle, now larger than Anthropic's entire annualized revenue of $14 billion. - After the 1993 Last Supper, 51 prime defense contractors collapsed into 5 within four years. AI labs face the same consolidation logic, just faster: through classified network access and government-funded compute rather than M&A. - IDIQ contracts account for 56% of DoD award dollars and run five years with extensions. Once embedded in classified systems with a security-cleared workforce (243-day average clearance processing), switching costs become close to prohibitive. - Palantir's trajectory previews the endgame: $4.48 billion FY2025 revenue (up 56%), 53.7% from government, now worth nearly twice Boeing at $320 billion market cap. ### [The Impossible Backhand](https://philippdubach.com/posts/the-impossible-backhand/) Published: 2026-02-17 Description: AI converges to the mean by design. Ninth-power scaling costs and a 53-point gap on Humanity's Last Exam show domain expertise is appreciating, not declining. Summary: In the latest issue of The AI Lab Newsletter, I featured a ByteDance Seedance 2.0 clip: two men playing tennis at what looked like an ATP tournament. Photorealistic. I probably wouldn't be able to tell it wasn't real footage if I didn't know. A co-worker who played junior pro-am tennis watched the same clip and said: "That backhand doesn't exist. Nobody plays it like that." His domain expertise spotted an error that probably fooled everyone else. Key findings: - Computational cost scales with the ninth power of improvement in practice: halving an AI error rate requires more than 500x the computational resources - On Humanity's Last Exam, top AI scores 37.5% versus human domain experts at roughly 90%, a 53-point gap, with AI calibration errors ranging from 34% to 89% - The Harvard/BCG study of 758 consultants found AI users produced 40% higher quality work within AI's frontier but were 19 percentage points less accurate outside it when they blindly trusted the output - Oxford researchers found complementary effects of AI on jobs are 1.7x larger than substitution effects, supporting augmentation over replacement ### [The SaaSpocalypse Paradox](https://philippdubach.com/posts/the-saaspocalypse-paradox/) Published: 2026-02-13 Description: AI capex failure and AI replacing all software are mutually exclusive. Why the 2026 SaaSpocalypse is a $2 trillion pricing error, not an extinction event. Summary: The market is simultaneously pricing AI capex failure and AI destroying all software. Both cannot be true. × Anthropic released 11 open-source plugins for Claude Cowork on January 30. Apache-2.0 licensed, file-based, running in a macOS-only research preview. Within a week, the IGV software ETF had fallen 32% from its September peak to a 52-week low of $79.65, roughly $2 trillion in market cap had evaporated, and hedge funds had made $24 billion shorting the sector. The RSI hit 18, the most oversold reading since 1990. JP Morgan titled their note "Software Collapse Broadens with Nowhere to Hide." Jefferies coined the term SaaSpocalypse. It was the worst software stock crash since the dot-com bust. Key findings: - The market is simultaneously punishing hyperscalers for weak AI capex returns and destroying software stocks because AI adoption is so strong it replaces all software, both cannot be true - The IGV software ETF fell 32% to an RSI of 18, the most oversold reading since 1990, while the sector delivered 17% aggregate earnings growth and every major name beat Q4 2025 estimates - Recurring-revenue software with 90%+ gross margins now trades at 32.4x forward earnings versus 43.6x for cyclical semiconductors, an 11.2x inversion that has not persisted historically - Goldman Sachs projects the application software market growing to $780 billion by 2030, with a16z arguing AI expands the addressable market from $350 billion in software to $6 trillion in white-collar services ### [Don't Go Monolithic; The Agent Stack Is Stratifying](https://philippdubach.com/posts/dont-go-monolithic-the-agent-stack-is-stratifying/) Published: 2026-02-10 Description: The enterprise AI agent stack is stratifying into six layers with different winners at each. Models commoditize; context — your organizational world model — compounds. A framework for agentic AI architecture decisions. Summary: The defensible asset in enterprise AI is not the model. It's the organizational world model. Every major compute era decomposes into specialized layers with different winners at each level. Cloud split into IaaS, PaaS, and SaaS. The modern data stack split into ingestion, warehousing, transformation, and BI. Each time, specialists beat the generalists because the layers have fundamentally different economics: different rates of change, different capital requirements, different sources of lock-in. Key findings: - 37% of enterprises now use five or more AI models in production, making single-provider lock-in the new version of single-cloud risk. - The enterprise AI agent stack is stratifying into six layers with different winners at each, and context, not models, sits in the highest lock-in and hardest-to-rebuild zone. - Most enterprise AI failures stem from shallow context: agents can retrieve the right documents but cannot reconstruct the reasoning processes humans follow to make decisions. - Gartner predicts 40% of enterprise apps will feature AI agents by 2026 but warns over 40% of agentic AI projects will be canceled by end of 2027 due to unclear business value. ### [Where Mobile Money Goes Now](https://philippdubach.com/posts/where-mobile-money-goes-now/) Published: 2026-02-07 Description: Apps overtook games in mobile IAP revenue for the first time in 2025, driven by $3.5B in GenAI growth. Analysis of Sensor Tower's State of Mobile 2026 report. Summary: Sensor Tower's State of Mobile 2026 report confirms what had been building for years: the mobile app economy has permanently shifted. For the first decade of mobile, games made more money than everything else combined. Clash of Clans and Candy Crush built empires on freemium. King went public. Supercell sold for $10 billion. That changed in 2025. Apps Overtake Games in Mobile Revenue × Non-game applications now generate more in-app purchase revenue than games. Apps crossed $85.6 billion in 2025, up 21% year-over-year. Games managed $81.8 billion, barely moving from the year before. Key findings: - Non-game apps hit $85.6 billion in mobile IAP revenue in 2025, overtaking games ($81.8B) for the first time, with GenAI adding $3.5 billion as the single largest growth category. - ChatGPT accounts for 40% of GenAI consumer app spending, making it the third-highest grossing app globally behind TikTok and Google One. - GenAI users cluster demographically with Reddit and X, not Instagram or Pinterest, meaning AI apps are scaling revenue while still reaching a niche audience. - YouTube is the only app ranked #1 across every US age group from 18-24 to 45+, something TikTok, Instagram, and Facebook have never achieved. ### [Claude Opus 4.6: Anthropic's New Flagship AI Model for Agentic Coding](https://philippdubach.com/posts/claude-opus-4.6-anthropics-new-flagship-ai-model-for-agentic-coding/) Published: 2026-02-05 Description: Claude Opus 4.6 brings a 1M token context window, 68.8% ARC-AGI-2, and Agent Teams to Claude Code. Full benchmark comparison vs GPT-5.2 and Gemini 3 Pro with pricing analysis. Summary: Anthropic just released Claude Opus 4.6, the latest frontier AI model in the Claude family. It's a big upgrade over Opus 4.5 and probably the most agentic-focused LLM release from any lab this year. Key upgrades: better agentic AI coding capabilities (plans more carefully, sustains longer tasks, catches its own mistakes), a 1M token context window (a first for Opus-class models), and 128K output tokens. Pricing holds at $5/$25 per million tokens. × Key findings: - Opus 4.6 scores 68.8% on ARC-AGI-2 versus 37.6% for Opus 4.5 and 54.2% for GPT-5.2, the largest single-generation leap on the benchmark that resists memorization - The 1M token context window scores 76% on MRCR v2 needle-in-a-haystack retrieval versus 18.5% for Sonnet 4.5, a different capability class rather than an incremental improvement - Opus 4.6 beats GPT-5.2 by 144 Elo points on GDPval-AA, the benchmark measuring real-world knowledge work across 44 professional occupations - Pricing holds at $5/$25 per million tokens versus GPT-5.2 at $2/$10, so the value proposition depends on whether agentic improvements translate to fewer retries and faster task completion ### [Buying the Haystack Might Not Work This Year](https://philippdubach.com/posts/buying-the-haystack-might-not-work-this-year/) Published: 2026-01-31 Description: a16z sees AI fundamentals thriving with 80% GPU utilization. AQR sees the CAPE at the 96th percentile. Both have data. Both may be right. Summary: I've been reading the January 2026 state of markets reports from Andreessen Horowitz and AQR, and their conclusions on the AI bubble question in 2026 are almost impossible to reconcile. The a16z view is straightforward: AI fundamentals are real, and current prices reflect that reality. Their evidence is compelling. The top 50 private AI companies now generate $40.6 billion in annual revenue. Companies like ElevenLabs and Cursor are hitting $100 million ARR faster than Slack or Twilio ever did. GPUs are running at 80% utilization, compared to the 7% utilization rate for fiber optic cables during the dotcom bubble. This isn't speculation, they argue. It's demand exceeding supply. × AQR looks at the same market and sees something else entirely. Their capital market assumptions put the U.S. CAPE ratio at the 96th percentile since 1980. Expected real returns for U.S. large cap equities over the next 5-10 years? 3.9%. For a global 60/40 portfolio, just 3.4%, well below the long-term average of roughly 5% since 1900. Risk premia, in their framework, are compressed across nearly every asset class. The narrative doesn't enter their models. × a16z points to earnings growth. The market rally hasn't been driven by multiple expansion, they note, but by actual EPS growth. Tech P/E multiples sit around 30-35x, elevated but nowhere near the 70-80x of 2000. Tech margins have "lapped the field" at 25%+ compared to 5-8% for the rest of the S&P 500. The fundamentals, they insist, are doing the work. × × AQR's response would be that fundamentals always look good near peaks. Their research shows a 50% probability that realized equity returns will miss estimates by more than 3 percentage points annually over the next decade. Compressed premia don't announce themselves with blaring headlines. They just quietly erode returns until investors notice they've been running in place. Key findings: - GPU utilization runs at 80% versus 7% for fiber optic cables during the dotcom era, but the U.S. CAPE ratio sits at the 96th percentile since 1980, historically associated with low future returns - Cumulative hyperscaler capex is projected to hit $4.8 trillion by 2030, requiring roughly $1 trillion in annual AI revenue to clear a 10% hurdle rate - Non-U.S. developed markets offer expected returns around 5% versus 3.9% for U.S. large caps, a valuation gap that holds even if the AI story is true - AQR estimates a 50% probability that realized equity returns will miss estimates by more than 3 percentage points annually over the next decade ### [Bandits and Agents: Netflix and Spotify Recommender Stacks in 2026](https://philippdubach.com/posts/bandits-and-agents-netflix-and-spotify-recommender-stacks-in-2026/) Published: 2026-01-30 Description: How hybrid recommender systems balance multi-armed bandits against LLM inference cost economics in 2026. A deep dive into Netflix recommendation algorithm architecture and Spotify's AI DJ recommender system. Summary: Hyperscalers spent over $350 billion on AI infrastructure in 2025 alone, with projections exceeding $500 billion in 2026. The trillion-dollar question is not whether machines can reason, but whether anyone can afford to let them. Hybrid recommender systems sit at the center of this tension. Large Language Models promised to transform how Netflix suggests your next show or how Spotify curates your morning playlist. Instead, the industry has split into two parallel universes, divided not by capability but by cost. Key findings: - A single LLM recommendation consumes thousands of tokens while a collaborative filtering dot product costs a fraction of a cent, making full LLM inference economically impossible at Netflix or Spotify scale - Netflix measures recommendation value by incrementality, the causal lift of showing a title versus not showing it, because a greedy algorithm that always surfaces high-probability titles collapses the discovery space - Spotify's AI DJ uses an agentic router that decides per-query whether to invoke the expensive LLM or fall back to fast keyword matching, an inference cost optimizer disguised as a product feature - Hyperscalers spent over $350 billion on AI infrastructure in 2025, but the industry consensus is a hybrid funnel: cheap models for millions of candidates, expensive reasoning only for the final dozen items a user sees ### [The Most Expensive Assumption in AI](https://philippdubach.com/posts/the-most-expensive-assumption-in-ai/) Published: 2026-01-26 Description: Sara Hooker's research challenges the trillion-dollar scaling thesis. Compact models now outperform massive ones as diminishing returns hit AI. Summary: Sara Hooker's paper arrived with impeccable timing. On the slow death of scaling dropped just as hyperscalers are committing another $500 billion to GPU infrastructure, bringing total industry deployment into the scaling thesis somewhere north of a trillion dollars. I've been tracking these capital flows for my own portfolio. Either Hooker is early to a generational insight or she's about to be very publicly wrong. × The core argument is very simple: bigger is not always better. Llama-3 8B outperforms Falcon 180B. Aya 23 8B beats BLOOM 176B despite having only 4.5% of the parameters. These are not isolated flukes. Hooker plots submissions to the Open LLM Leaderboard over two years and finds a systematic trend where compact models consistently outperform their bloated predecessors. The bitter lesson, as Rich Sutton framed it, was that brute force compute always wins. Hooker's counter is that maybe we've been held hostage to "a painfully simple formula" that's now breaking down. × Scaling laws, she notes, only reliably predict pre-training test loss. When you look at actual downstream performance, the results are "murky or inconsistent." The term "emergent properties" gets thrown around to describe capabilities that appear suddenly at scale, but Hooker points out this is really just a fancy way of admitting we have no idea what's coming. If your scaling law can't predict emergence, it's not much of a law. Key findings: - Compact models now outperform massive predecessors: Llama-3 8B beats Falcon 180B, Aya 23 8B beats BLOOM 176B at 4.5% of parameters - Scaling laws only reliably predict pre-training loss, not downstream performance, because emergent properties mask our inability to predict what's next - Hedge fund short interest in AI-adjacent utilities sits at the 99th percentile vs. the past 5 years - Frontier labs are incorporating classical symbolic tools on CPUs, meaning the age of brute-force scaling may be ending ### [Enterprise AI Strategy is Backwards](https://philippdubach.com/posts/enterprise-ai-strategy-is-backwards/) Published: 2026-01-22 Description: 85% of AI projects fail. Only 26% translate pilots to production. The winners automate the coordination layer where employees spend 57% of their workday. Summary: That’s the claim made by LinkedIn co-founder Reid Hoffman. It’s a bold assertion, so I set out to investigate whether the data supports it. × The result is a comprehensive report, backed by more than 30 sources. You can download the full report and the accompanying presentation for free. Key findings: - 85% of enterprise AI projects fail; only 26% of companies translate pilots to production - Employees spend 57% of their workday on coordination, the layer AI should target first - Language models bridge messy communication to structured data: transcripts to CRM fields at 99% accuracy, 30% higher win rates - AI gains compound when knowledge capture becomes shareable across the organization ### [Does AI mean the demand on labor goes up?](https://philippdubach.com/posts/does-ai-mean-the-demand-on-labor-goes-up/) Published: 2026-01-15 Description: AI was supposed to free us. The Jevons paradox plays out in real time: efficiency expands workload, not leisure. 77% of workers say AI added to their work. Summary: Joe Weisenthal from Bloomberg, this week: All my shower thoughts now are about designing efficient workflows for synthesizing, collecting, labeling and annotating data. Same. Since I started building every app and tool I thought would make my life easier, my workflow more efficient, I haven't stopped. Apparently non-developers are now writing apps instead of buying them. This is the AI productivity paradox in miniature: the tools get better and we do more, not less. Key findings: - Workers in AI-exposed occupations now work roughly 3 extra hours per week, and leisure time has dropped by the same amount, according to NBER research - 77% of employees say AI tools have added to their workload, not reduced it, per Upwork's survey data - Only 21% of employees use time saved by AI for personal life, with the rest reinvesting it directly back into work - The Jevons paradox from 1865 predicted this: more efficient steam engines increased coal consumption, and more efficient AI tools are increasing work output expectations the same way ### [Social Media Success Prediction: BERT Models for Post Titles](https://philippdubach.com/posts/social-media-success-prediction-bert-models-for-post-titles/) Published: 2026-01-10 Description: Training RoBERTa to predict Hacker News success revealed temporal leakage inflating metrics. How temporal splits, calibration, and regularization fix it. Summary: Last week I published a Hacker News title sentiment analysis based on the Attention Dynamics in Online Communities paper I have been working on. The discussion on Hacker News raised the obvious question: can you actually predict what will do well here? × The honest answer is: partially. Timing matters. News cycles matter. Who submits matters. Weekend versus Monday morning matters. Most of these factors aren't in the title. But titles aren't nothing either. "Show HN" signals something. So does phrasing, length, and topic selection. The question becomes: how much signal can you extract from 80 characters? Key findings: - A fine-tuned RoBERTa model achieves 0.685 AUC predicting Hacker News success from titles alone, with the top 10% of predictions hitting at 1.9x the random baseline - Switching from random to temporal train/test splits dropped the ensemble AUC from 0.714 to 0.693 and collapsed SBERT's contribution from 0.35 weight to 0.10, exposing temporal leakage - Increasing dropout to 0.2, weight decay to 0.05, and freezing 6 lower transformer layers cut the train-test overfitting gap by 61% while barely affecting test performance - Isotonic calibration reduced Expected Calibration Error from 0.089 to 0.043, meaning predicted probabilities now match observed hit rates ### [Beyond Vector Search: Why LLMs Need Episodic Memory](https://philippdubach.com/posts/beyond-vector-search-why-llms-need-episodic-memory/) Published: 2026-01-09 Description: Context windows aren't memory. Explore EM-LLM's episodic architecture, knowledge graph tools like Mem0 and Letta, and why vectors fail for sequential data. Summary: You've seen this message before. Copilot pausing; In long sessions, it happens often enough that I started wondering what's actually going on in there. Hence this post. × The short answer: context windows grew larger. Claude handles 200K tokens, Gemini claims a million. But bigger windows aren't memory. They're a larger napkin you throw away when dinner's over. Key findings: - Context windows grew to 200K tokens (Claude) and 1M (Gemini), but bigger windows are not memory because they lack persistence, temporal awareness, and the ability to update facts across sessions - EM-LLM segments conversation into episodes using surprise detection, and its event boundaries correlate with where humans perceive breaks in experience - HeadKV found you can discard 98.5% of a transformer's key-value cache by keeping only the attention heads that matter for memory, with almost no quality loss - Mem0 reports 80-90% token cost reduction with a 26% quality improvement by replacing raw chat history with structured memory, though the claim is unvalidated ### [65% of Hacker News Posts Have Negative Sentiment, and They Outperform](https://philippdubach.com/posts/65-of-hacker-news-posts-have-negative-sentiment-and-they-outperform/) Published: 2026-01-07 Description: Sentiment analysis of 32,000 Hacker News posts shows 65% skew negative and earn 27% more points. Six transformer and LLM models tested, full data included. Summary: Negativity Bias and Engagement on Hacker News This Hacker News sentiment analysis began with a simple observation: posts with negative sentiment average 35.6 points on Hacker News. The overall average is 28 points. That's a 27% performance premium for negativity. × This finding comes from an empirical study I've been running on HN attention dynamics, covering decay curves, preferential attachment, survival probability, and early-engagement prediction. The preprint is available on SSRN. I already had a gut feeling. Across 32,000 posts and 340,000 comments, nearly 65% register as negative. This might be a feature of my classifier being miscalibrated toward negativity; yet the pattern holds across six different models. Key findings: - Across 32,000 Hacker News posts, nearly 65% register as negative sentiment, a pattern that holds across six different models including DistilBERT, RoBERTa, and Llama 3.1 8B - Negative posts average 35.6 points versus 28 overall, a 27% engagement premium for negativity, though most HN negativity is substantive critique rather than toxicity - Score distribution follows a power law with high Gini coefficients, meaning a small fraction of posts capture most attention while the majority get almost none ### [RSS Swipr: Find Blogs Like You Find Your Dates](https://philippdubach.com/posts/rss-swipr-find-blogs-like-you-find-your-dates/) Published: 2026-01-05 Description: Build an open-source ML RSS reader with swipe interface. Uses MPNet embeddings and Thompson sampling for personalized feeds that escape the filter bubble. Summary: × Algorithmic timelines are everywhere now. But I still prefer the control of RSS. Readers are good at aggregating content but bad at filtering it. What I wanted was something borrowed from dating apps: instead of an infinite list, give me cards. Swipe right to like, left to dislike. Then train a model to surface what I actually want to read. So I built RSS Swipr. Key findings: - MPNet embeddings combined with a Hybrid Random Forest achieve 75.4% ROC-AUC on article preference prediction, up from 66% with hand-engineered features alone. - Thompson sampling allocates 80% of shown articles to predicted preferences and 20% to random exploration, preventing filter bubbles while keeping recommendations useful. - The entire system runs locally at zero cost: Python/Flask backend, vanilla JS frontend, SQLite storage, and free-tier Google Colab for GPU training. ### [Apple's AI Bet: Playing the Long Game or Missing the Moment?](https://philippdubach.com/posts/apples-ai-bet-playing-the-long-game-or-missing-the-moment/) Published: 2025-12-30 Description: Apple's $157B cash pile and Gemini-powered Siri shift show a restrained AI strategy. Analysis of whether Apple wins as AI models become commodities. Summary: The Information published a piece today arguing that Apple's restrained AI approach may finally pay off in 2026. The thesis: while OpenAI, Google, and Meta pour hundreds of billions into data centers and model training, Apple has kept its powder dry, sitting on $157 billion in cash and marketable securities as of Q4 2025. If the AI spending bubble deflates, Apple's position looks rather clever. This piqued my interest, from a strategy point of view: Apple hasn't been absent from AI. They've been making a specific bet that large language models will commoditize, and that value will flow to distribution and customer relationships rather than to whoever has the best model. The revamped Siri expected in spring 2026 will reportedly be powered by Google's Gemini through a deal worth $1 billion annually. The custom Gemini model will run on Apple's Private Cloud Compute servers. × This is consistent with Apple's history. They didn't build their own search engine. They took Google's money to be the default on Safari. John Giannandrea's retirement earlier this month, with Siri now under Mike Rockwell, signals internal recognition that something had to change. Key findings: - Apple sits on $157B in cash while Microsoft, Google, Amazon, and Meta spend roughly $400B collectively on AI infrastructure in 2025, betting that LLMs will commoditize. - The revamped Siri will be powered by Google's Gemini through a $1B annual deal running on Apple's Private Cloud Compute servers, treating models as interchangeable utilities. - AI API pricing has dropped 97% since GPT-3's launch, and Apple can push features to 2.3B active devices via software updates, favoring distribution over R&D spending. ### [Is AI Really Eating the World? AGI, Networks, Value [2/2]](https://philippdubach.com/posts/is-ai-really-eating-the-world-agi-networks-value-2/2/) Published: 2025-11-24 Description: AGI predictions miss the point. Multiple competing models means price war. Value flows to applications, customer relationships, and vertical integrators. Summary: Start by reading Is AI Really Eating the World? What we've Learned [1/2] All current recommendation systems work by capturing and analyzing user behavior at scale. Netflix needs millions of users watching millions of hours to train its recommendation algorithm. Amazon needs billions of purchases. The network effect comes from data scale. What if LLMs can bypass this? What if an LLM can provide useful recommendations by reasoning about conceptual relationships rather than requiring massive behavioral datasets? If I ask for "books like Pirsig's Zen and the Art of Motorcycle Maintenance but more focused on Eastern philosophy," a sufficiently capable LLM might answer well without needing to observe 100 million readers. It understands (or appears to understand) the conceptual space. I'm uncertain whether LLMs can do this reliably by the end of 2025. The fundamental question is whether they reason or pattern-match at a very sophisticated level. Recent research suggests LLMs may rely more on statistical correlations than true reasoning. If it's mostly pattern-matching, they still need the massive datasets and we're back to conventional network effects. If they can actually reason over conceptual spaces, that's different. That would unbundle data network effects from recommendation quality. Recommendation quality would depend on model capability, not data scale. And if model capability is commoditizing, then the value in recommendations flows to whoever owns customer relationships and distribution, not to whoever has the most data or the best model. I lean toward thinking LLMs are sophisticated pattern-matchers rather than reasoners, which means traditional network effects still apply. But this is one area where I'm genuinely waiting to see more evidence. Key findings: - Even if AGI arrives by 2028, multiple competing providers will likely reach it simultaneously, meaning prices collapse toward marginal cost and value flows to AI users, not providers. - GPT-4 launched with a substantial capability lead in March 2023, but within six months Claude 2 was comparable, suggesting frontier leads are measured in months, not years. - LLMs likely remain sophisticated pattern-matchers rather than true reasoners, which means traditional data network effects still apply and recommendation incumbents keep their moats. - The hyperscaler playbook (infrastructure + model + distribution + bundling) is more plausible for value capture than the 'best model wins' thesis, mirroring how AWS, not databases, captured cloud value. ### [Is AI Really Eating the World? [1/2]](https://philippdubach.com/posts/is-ai-really-eating-the-world-1/2/) Published: 2025-11-23 Description: Hyperscalers spend $400B on AI, API prices drop 97%, and DeepSeek builds frontier models for $500M. Value is flowing to applications, not model providers. Summary: In August 2011, Marc Andreessen wrote "Why Software Is Eating the World", an essay about how software was transforming industries, disrupting traditional businesses, and revolutionizing the global economy. Recently, Benedict Evans, a former a16z partner, gave a presentation on the generative AI platform shift three years after ChatGPT's launch. His argument in short: we know this matters, but we don't know how. In this article I will try to explain why I find his framing fascinating but incomplete, and why the evidence points toward AI model commoditization rather than durable competitive advantages at the model layer. Evans structures technology history in cycles. Every 10-15 years, the industry reorganizes around a new platform: mainframes (1960s-70s), PCs (1980s), web (1990s), smartphones (2000s-2010s). Each shift pulls all innovation, investment, and company creation into its orbit. Generative AI appears to be the next platform shift, or it could break the cycle entirely. The range of outcomes spans from "just more software" to a single unified intelligence that handles everything. The pattern recognition is smart, but I think the current evidence points more clearly toward commoditization than Evans suggests, with value flowing up the AI value chain to applications rather than to model providers. Key findings: - Hyperscalers are spending $400B on AI infrastructure in 2025, more than global telecom capex, while API pricing has dropped 97% since GPT-3, pointing to rapid commoditization. - 92% of developers now use AI coding tools, but 40% of CIOs do not expect production AI agent deployment until 2026 or later, showing adoption is deep in pockets but shallow overall. - Consulting firms like Accenture are booking $3B+ in GenAI revenue, but the money comes from integration and process redesign, not from the models themselves. - DeepSeek proved a frontier model can be built for $500M, collapsing the assumption that only the richest labs can compete at the capability frontier. ### [Weather Forecasts Have Improved a Lot](https://philippdubach.com/posts/weather-forecasts-have-improved-a-lot/) Published: 2025-11-22 Description: Four-day forecasts now match one-day accuracy from 30 years ago. How AI models like WeatherNext 2 use CRPS training to preserve extreme weather signals. Summary: Reading the press release for Google DeepMind's WeatherNext 2, I wondered: have weather forecasts actually improved over the past years? Turns out they have, dramatically. A four-day forecast today matches the accuracy of a one-day forecast from 30 years ago. Hurricane track errors that once exceeded 400 nautical miles for 72-hour forecasts now sit below 80 miles. The European Centre for Medium-Range Weather Forecasts reports three-day forecasts now reach 97% accuracy, with seven-day forecasts approaching that threshold. Key findings: - A four-day weather forecast today matches the accuracy of a one-day forecast from 30 years ago, with three-day forecasts now reaching 97% accuracy. - WeatherNext 2 generates forecasts in under a minute on a single TPU, compared to hours on a supercomputer for physics-based models, an up to 10,000x speed improvement. - CRPS training preserves sharp spatial features and extreme values that L2 losses blur, solving a key weakness of earlier neural weather models for cyclone and heat wave prediction. - Hurricane track errors dropped from 400+ nautical miles to below 80 miles for 72-hour forecasts, with Google's model performing well against actual hurricane paths this season. ### [AI Models as Standalone P&Ls](https://philippdubach.com/posts/ai-models-as-standalone-pls/) Published: 2025-11-09 Description: OpenAI lost $11.5B in one quarter. But Anthropic CEO Dario Amodei argues each AI model is independently profitable. Here's why the math is complicated. Summary: Microsoft reported earnings for the quarter ended Sept. [...] buried in its financial filings were a couple of passages suggesting that OpenAI suffered a net loss of $11.5 billion or more during the quarter (per press reporting; figure is press-sourced rather than from audited disclosures). For every dollar of revenue, they're allegedly spending roughly $5 to deliver the product. These OpenAI losses initially sound like a joke about "making it up on volume," but they point to a more fundamental problem facing OpenAI and its competitors. AI companies are locked into continuously releasing more powerful (and expensive) models. If they stop, open-source alternatives will catch up and offer equivalent capabilities at substantially lower costs. This creates an uncomfortable dynamic. If your current model requires spending more than you earn just to fund the next generation, the path to profitability becomes unclear—perhaps impossible. Key findings: - OpenAI lost $11.5B in one quarter, spending roughly $5 for every $1 of revenue because each new model generation costs about 10x more to train. - Dario Amodei argues each AI model is independently profitable: a $100M training run generating $200M in revenue looks like a 2x return when isolated from the next investment cycle. - The per-model profitability thesis breaks down if open-source alternatives close the capability gap within months, compressing the revenue window before training costs are recouped. ### [Working with Models](https://philippdubach.com/posts/working-with-models/) Published: 2025-11-08 Description: Diffusion models corrupt data into noise, then reverse the process. Learn the math with Stefano Ermon's Stanford CS236 course, free on YouTube. Summary: There was this "I work with Models" joke which I first heard years ago from an analyst working on a valuation model (see my previous post). I guess it has become more relevant than ever: This monograph presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse formulations arise from shared mathematical ideas. Diffusion modeling starts by defining a forward process that gradually corrupts data into noise, linking the data distribution to a simple prior through a continuum of intermediate distributions. Key findings: - Diffusion models work by gradually corrupting data into noise through a forward process, then learning to reverse that process to generate new samples. - Stanford CS236 by Stefano Ermon covers the full mathematical foundations of deep generative models, from VAEs and GANs to score-based diffusion, and is freely available on YouTube. - The shared mathematical ideas underlying diverse diffusion formulations trace back to linking data distributions to simple priors through a continuum of intermediate distributions. ### [Trading on Market Sentiment](https://philippdubach.com/posts/trading-on-market-sentiment/) Published: 2025-02-20 Description: GPT-3.5 matched RavenPack's 41% returns in a sentiment analysis trading strategy using 2,072 news headlines. See the full backtest results and comparison. Summary: This post is based in part on a 2022 presentation I gave for the ICBS Student Investment Fund and my seminar work at Imperial College London. As we were looking for new investment strategies for our Macro Sentiment Trading team, OpenAI had just published their GPT-3.5 Model. After first experiments with the model, we asked ourselves: How would large language models like GPT-3.5 perform in predicting sentiment in financial markets, where the signal-to-noise ratio is notoriously low? And could they potentially even outperform industry benchmarks at interpreting market sentiment from news headlines? The idea wasn't entirely new. Studies [2] [3] have shown that investor sentiment, extracted from news and social media, can forecast market movements. But most approaches rely on traditional NLP models or proprietary systems like RavenPack. With the recent advances in large language models, I wanted to test whether these more sophisticated models could provide a competitive edge in sentiment-based trading. Before looking at model selection, it's worth understanding what makes trading on sentiment so challenging. News headlines present two fundamental problems that any robust system must address. × First, headlines are inherently non-stationary. Unlike other data sources, news reflects the constantly shifting landscape of global events, political climates, economic trends, etc. A model trained on COVID-19 vaccine headlines from 2020 might struggle with geopolitical tensions in 2023. This temporal drift means algorithms must be adaptive to maintain relevance. × Second, the relationship between headlines and market impact is far from obvious. Consider these actual headlines from November 2020: "Pfizer Vaccine Prevents 90% of COVID Infections" drove the S&P 500 up 1.85%, while "Pfizer Says Safety Milestone Achieved" barely moved the market at -0.05%. The same company, similar positive news, dramatically different market reactions. Key findings: - GPT-3.5 returned 41.02% vs RavenPack's 40.99% on 2,072 Dow Jones Newswire headlines from 2018-2022, matching the commercial benchmark at a fraction of the cost - Both sentiment strategies underperformed buy-and-hold (58.13%) in the full bullish period, but outperformed during the volatile 2020-2022 window (22.83% vs 21.00%) - The two models showed a 0.59 sentiment score correlation, agreeing on direction but differing in granularity because GPT provides continuous scores while traditional NLP gives discrete labels - Real-world deployment faces a latency problem: GPT needs seconds to score a headline, while HFT firms act on news within milliseconds ## Investing (18 articles) ### [Reconciling Enterprise AI Revenue](https://philippdubach.com/posts/reconciling-enterprise-ai-revenue/) Published: 2026-05-17 Description: Four enterprise AI revenue figures span a 40x range. The $63.2B audit-grade floor is the only tier that defensibly underwrites $690B of hyperscaler capex. Summary: × Companion to the full research report, Reconciling Enterprise AI Revenue: A Methodological Crosswalk and Vendor-Level Census, 2025. The PDF carries the 68-vendor primary-source census, the six-tier disclosure framework, the per-step sourced deductions, and the netting of structural double-counts. Key findings: - Four widely-cited 2025 enterprise AI revenue figures span 40x ($37B Menlo, $100-135B vendor run-rate sum, $307B IDC, $1.478T Gartner); each is correct under its own perimeter, so the reconciliation is definitional rather than measurement error. - The audit-grade floor that defensibly underwrites $690B of 2026 hyperscaler capex is $63.2B narrow or $72.5B broad; looser numbers mix channel markup, third-party ARR claims, or full-retail device value into the same denominator. - Capex-to-revenue runs at 10.9x on the narrow audit-grade basis against the 1990s telecom peak of 3.5x; even after netting only AI-incremental capex, the ratio sits at 6.3-7.9x, worse than the closest historical analogue. - The Spread Index (audit-grade revenue over Gartner umbrella) opens at 4.28% narrow / 4.90% broad in May 2026; if it stays below 5% through Q4 2026, capex coverage cannot improve from disclosure alone and revenue itself must compound. ### [Midyear Portfolio Review: The Rotation Worked. Europe Didn't.](https://philippdubach.com/posts/midyear-portfolio-review-the-rotation-worked-europe-didnt/) Published: 2026-05-14 Description: Midyear review of the 2026 CHF portfolio: EM, small-cap and Japan paid off; the Europe overweight underperformed; CAPE rose to 42; the dollar didn't fall. Summary: × This is a personal-portfolio review for my own long-term book — what I weighed, what I changed, and what the calls cost me through May 2026. It is not a recommendation. In December I rebalanced my own portfolio around five theses for 2026. Five months in, this (early; well, just too much happened) midyear review puts four of them ahead of where I expected and one well behind. The one I had the most personal conviction in, an overweight in European equities by five percentage points, has trailed every other rotation it was meant to beat. Key findings: - The portfolio returned +3.7% in CHF through May 8, matching a global 60/40 (+3.8%) but trailing the S&P 500 in CHF (+5.8%); the 2025 dollar tailwind has nearly disappeared. - The Shiller CAPE rose from 39.8 in December to 42.0 in May 2026, the highest reading since the December 1999 dot-com peak of 44.2. - Three of four rotation calls paid off (Emerging Markets +19.0%, US Small Cap +12.5%, Japan +11.7% in CHF), but the Europe overweight underdelivered at +3.3%. - NVIDIA's S&P 500 weight grew from 7.2% to 8.17% while Microsoft slipped from 5.9% to 5.0%, making the concentration risk a single-name story rather than a Mag-7 story. ### [Two Anthropics](https://philippdubach.com/posts/two-anthropics/) Published: 2026-05-09 Description: Two Anthropics: the safety lab Dario founded in 2021 and the $380B frontier lab it became. Same organism, two narratives the company itself has to reconcile. Three scenarios for how the tension resolves. Summary: × Anthropic was founded to be the safety lab that would pull rivals upward. Five years later it is one of the most aggressive frontier scalers at $380 billion — and the company whose own founding thesis treats frontier capability at this scale as the thing most likely to require the safeguards the company says it builds. Key findings: - Anthropic ran from a $124M Series A to a $380B valuation in five years on $10B revenue; the safety lab and the frontier lab are now the same organism. - The November 2023 refusal of OpenAI's CEO offer signaled the safety thesis was real, but three years of 10x revenue growth made it a different company than the one that refused. - Three forcing functions already show the paradox is binding: a March 2026 federal injunction in the DoD case, the Pottinger WSJ chip-controls op-ed, and the August 2025 Nvidia feud. - The allocator question is whether safety-narrative is moat or constraint at frontier scale; the rate of safety-practice diffusion versus capability spread is the signal that decides. ### [AI Models Are the New Rebar](https://philippdubach.com/posts/ai-models-are-the-new-rebar/) Published: 2026-03-11 Description: Qwen 3.5-35B runs on a gaming PC and matches Claude Sonnet 4.5. When the commodity version is 95% as good and 97% cheaper, you have a pricing problem. Summary: Qwen 3.5-35B-A3B, a model released by Alibaba in February 2026, runs on a single consumer GPU with 24 gigabytes of VRAM. A secondhand RTX 4090, available for around $2,000, generates 60 to 100 tokens per second with it. On select benchmarks per Alibaba's own evaluations, it matches or beats Claude Sonnet 4.5. The Qwen 3.5 Flash tier costs $0.10 per million input tokens through Alibaba's API. Claude Sonnet 4.5 costs $3.00. That's a 97 percent discount. For comparable performance. Key findings: - Qwen 3.5-35B matches Claude Sonnet 4.5 on select benchmarks at $0.10 per million input tokens versus $3.00, a 97 percent cost gap for comparable performance. - The performance gap between open-source and proprietary AI models shrank from 8 percent to 1.7 percent in a single year, per the Stanford HAI 2025 AI Index. - Reported figures on OpenAI's 2025 margins and losses (adjusted gross margin compression and a roughly $13.5 billion H1 2025 loss) have circulated in trade press; the underlying numbers should be treated as press-sourced rather than primary disclosure until OpenAI publishes audited financials. - AI inference prices decline at a median rate of 50x per year for equivalent performance, according to Epoch AI, a pace that dwarfs Moore's Law. ### [AI Capex Arms Race: Who Blinks First?](https://philippdubach.com/posts/ai-capex-arms-race-who-blinks-first/) Published: 2026-03-08 Description: Alphabet's free cash flow is on track to fall 90% in 2026. Amazon's is at $11B. $690B in AI capex is cannibalizing the cash that justified these valuations. Summary: Alphabet's free cash flow is projected to fall roughly 90% in 2026. Not because the business is in trouble. Because the company has committed to spending $83–93 billion more on capital expenditure than it did last year. That is what $660–690 billion in AI capex looks like up close. Amazon guided to $200 billion alone. Meta's long-term debt more than doubled to $58.7 billion to help finance its share. Goldman Sachs projects cumulative 2025–2027 spending across the Big 4 at $1.15 trillion, more than double the $477 billion spent over the prior three years combined. BofA credit strategists found this will consume 94% of operating cash flow minus dividends and buybacks. Key findings: - The Big 4 hyperscalers are on track to spend $610–665 billion in 2026, roughly 70% above 2025 levels, with Goldman Sachs projecting cumulative 2025–2027 spend at $1.15 trillion - Alphabet's free cash flow may fall from $73 billion to roughly $8 billion in 2026 as capex doubles; Amazon's is already compressed to $11 billion TTM with $200B guidance ahead - Direct AI revenue covers roughly 15% of AI-specific capex: Sequoia's David Cahn calculated the ecosystem needs $600 billion in annual revenue to justify current infrastructure spending, against the roughly $50–100 billion it actually generates - Inference costs are falling 50–200x per year (Epoch AI), meaning existing GPU infrastructure may become stranded faster than depreciation schedules assume ### [Every Bulge Bracket Bank Agrees on AI](https://philippdubach.com/posts/every-bulge-bracket-bank-agrees-on-ai/) Published: 2026-03-01 Description: I read 12 AI research reports from Goldman Sachs, JPMorgan, UBS, and 6 other banks. Here's the consensus they're pushing, and what they're not saying. Summary: × I spent the last week reading 12 bank AI research reports from nine of the world's largest financial institutions: Goldman Sachs, JPMorgan, Morgan Stanley (three separate reports), UBS, Barclays, Bank of America, HSBC, Citi, Deutsche Bank, and Santander. I wanted to understand how institutions that collectively manage trillions of dollars and employ thousands of analysts actually see this technology heading into 2026: where they agree, where they diverge, and what they're being less than forthcoming about. Key findings: - Not a single report from any of the nine institutions recommends reducing AI exposure. The absence of a bearish voice is itself the most important signal in the entire collection - The macro productivity estimates span from +0.7% to +15% TFP over ten years, using the same underlying academic papers, cherry-picked to support nine different commercial narratives - Only ~10% of US companies are productively using AI and 42% have abandoned GenAI projects. The gap between capex commitment and actual adoption is the most underweighted risk in the consensus - AI capex already contributed 1.4–1.5 percentage points to US GDP growth in H1 2025, making infrastructure spending the dominant driver of US economic expansion in that period - Morgan Stanley's historical data shows second-order beneficiaries outperform first-order enablers by 10–100x over long horizons, yet nearly every bank's current positioning favours first-order plays anyway ### [The Absolute Insider Mess of Prediction Markets](https://philippdubach.com/posts/the-absolute-insider-mess-of-prediction-markets/) Published: 2026-02-22 Description: A Google insider made $1.15M on Polymarket in 24 hours. Israeli soldiers bet classified strike timing. Why prediction markets need insider trading regulation. Summary: A wallet that observers have suggested is operated by, or close to, someone with access to non-public Google information deposited $3 million into Polymarket on December 3, 2025, placed bets on 23 separate "Google Year in Search" outcomes, got 22 right, and walked away with $1.15 million in profit in under 24 hours, according to the reporting. One of those bets: that d4vd would be the most-searched person of 2025, purchased at roughly 5 cents when the market gave it a 0.2% probability. No regulator has charged anyone in connection with these trades. Key findings: - A suspected Google insider went 22-for-23 on Polymarket, turning $3M into $4.15M in under 24 hours, and no U.S. regulator has acted on any of the three major cases since December 2025 - A Fed working paper found Kalshi's macro markets matched the actual FOMC rate outcome on the day before every meeting since 2022, outperforming traditional forecasting tools in certain windows - Combined prediction market volume hit $44B in 2025, roughly 300x the level from early 2024, while the CFTC has brought zero insider trading enforcement actions - Unchecked insider trading triggers Akerlof's lemons dynamic: market makers widen spreads, uninformed participants leave, and the accuracy gains from one insider's trade are offset by the liquidity collapse that follows ### [The SaaSpocalypse Paradox](https://philippdubach.com/posts/the-saaspocalypse-paradox/) Published: 2026-02-13 Description: AI capex failure and AI replacing all software are mutually exclusive. Why the 2026 SaaSpocalypse is a $2 trillion pricing error, not an extinction event. Summary: The market is simultaneously pricing AI capex failure and AI destroying all software. Both cannot be true. × Anthropic released 11 open-source plugins for Claude Cowork on January 30. Apache-2.0 licensed, file-based, running in a macOS-only research preview. Within a week, the IGV software ETF had fallen 32% from its September peak to a 52-week low of $79.65, roughly $2 trillion in market cap had evaporated, and hedge funds had made $24 billion shorting the sector. The RSI hit 18, the most oversold reading since 1990. JP Morgan titled their note "Software Collapse Broadens with Nowhere to Hide." Jefferies coined the term SaaSpocalypse. It was the worst software stock crash since the dot-com bust. Key findings: - The market is simultaneously punishing hyperscalers for weak AI capex returns and destroying software stocks because AI adoption is so strong it replaces all software, both cannot be true - The IGV software ETF fell 32% to an RSI of 18, the most oversold reading since 1990, while the sector delivered 17% aggregate earnings growth and every major name beat Q4 2025 estimates - Recurring-revenue software with 90%+ gross margins now trades at 32.4x forward earnings versus 43.6x for cyclical semiconductors, an 11.2x inversion that has not persisted historically - Goldman Sachs projects the application software market growing to $780 billion by 2030, with a16z arguing AI expands the addressable market from $350 billion in software to $6 trillion in white-collar services ### [Buying the Haystack Might Not Work This Year](https://philippdubach.com/posts/buying-the-haystack-might-not-work-this-year/) Published: 2026-01-31 Description: a16z sees AI fundamentals thriving with 80% GPU utilization. AQR sees the CAPE at the 96th percentile. Both have data. Both may be right. Summary: I've been reading the January 2026 state of markets reports from Andreessen Horowitz and AQR, and their conclusions on the AI bubble question in 2026 are almost impossible to reconcile. The a16z view is straightforward: AI fundamentals are real, and current prices reflect that reality. Their evidence is compelling. The top 50 private AI companies now generate $40.6 billion in annual revenue. Companies like ElevenLabs and Cursor are hitting $100 million ARR faster than Slack or Twilio ever did. GPUs are running at 80% utilization, compared to the 7% utilization rate for fiber optic cables during the dotcom bubble. This isn't speculation, they argue. It's demand exceeding supply. × AQR looks at the same market and sees something else entirely. Their capital market assumptions put the U.S. CAPE ratio at the 96th percentile since 1980. Expected real returns for U.S. large cap equities over the next 5-10 years? 3.9%. For a global 60/40 portfolio, just 3.4%, well below the long-term average of roughly 5% since 1900. Risk premia, in their framework, are compressed across nearly every asset class. The narrative doesn't enter their models. × a16z points to earnings growth. The market rally hasn't been driven by multiple expansion, they note, but by actual EPS growth. Tech P/E multiples sit around 30-35x, elevated but nowhere near the 70-80x of 2000. Tech margins have "lapped the field" at 25%+ compared to 5-8% for the rest of the S&P 500. The fundamentals, they insist, are doing the work. × × AQR's response would be that fundamentals always look good near peaks. Their research shows a 50% probability that realized equity returns will miss estimates by more than 3 percentage points annually over the next decade. Compressed premia don't announce themselves with blaring headlines. They just quietly erode returns until investors notice they've been running in place. Key findings: - GPU utilization runs at 80% versus 7% for fiber optic cables during the dotcom era, but the U.S. CAPE ratio sits at the 96th percentile since 1980, historically associated with low future returns - Cumulative hyperscaler capex is projected to hit $4.8 trillion by 2030, requiring roughly $1 trillion in annual AI revenue to clear a 10% hurdle rate - Non-U.S. developed markets offer expected returns around 5% versus 3.9% for U.S. large caps, a valuation gap that holds even if the AI story is true - AQR estimates a 50% probability that realized equity returns will miss estimates by more than 3 percentage points annually over the next decade ### [The Market Can Stay Irrational Longer Than You Can Stay Solvent](https://philippdubach.com/posts/the-market-can-stay-irrational-longer-than-you-can-stay-solvent/) Published: 2026-01-11 Description: Steve Eisman explains how U.S. equity markets have structurally decoupled from everyday economic reality through concentration and passive investing. Summary: A friend recently recommended Steve Eisman's podcast to me. Eisman, you might recall, is the hedge fund manager portrayed in The Big Short who famously bet against subprime mortgages before the 2008 crisis. In his most recent episode, Eisman laid out a thesis for something that made me uncomfortable ever since the Covid-19 stock market crash recovery: the U.S. equity market has structurally decoupled from everyday economic reality. I've written about market concentration in my 2026 portfolio allocation. But Eisman's point isn't just about concentration. It's about what this concentration means for everyone else. Consider what happens to consumer-exposed sectors. Combined, healthcare, consumer discretionary, and consumer staples have fallen from 38% of the index in 2015 to just 25% today. This matters because roughly 70% of U.S. GDP is consumer-driven. The traditional logic was simple: consumer spending drives the economy, consumer stocks reflect that spending, and therefore the stock market reflects economic health. That relationship has broken down. Key findings: - Consumer-exposed sectors fell from 38% to 25% of the S&P 500 since 2015, even though 70% of U.S. GDP is consumer-driven, breaking the link between the index and everyday economic reality - Index funds now control 60% of flows, buying mechanically in proportion to market cap with no price sensitivity, meaning corrections lose the stabilizing bid that active managers once provided - With NVIDIA at 7.7%, Apple at 6.8%, and Microsoft at 6.1% of the index, most institutional mandates physically prevent managers from holding proportional positions due to risk limits - Massive embedded capital gains create asymmetric liquidity: plenty of buyers on the way up, scarce ones on the way down, because selling triggers taxable events investors delay until forced ### [Praise by Name, Criticize by Category: Warren Buffett Retires at 95](https://philippdubach.com/posts/praise-by-name-criticize-by-category-warren-buffett-retires-at-95/) Published: 2026-01-06 Description: Buffett exits after paying $26.8B in taxes. What 60 years of letters reveal about admitting mistakes, insurance float, and why Abel inherits $300B in cash. Summary: Warren Buffett has stepped down as CEO at 95. Greg Abel inherits a company that paid $26.8 billion in federal income taxes last year, roughly 5% of what all of corporate America paid combined. I do not have much in common with Buffett, but I will miss his shareholder letters. Berkshire's archive is a rare case of a public company explaining decisions candidly to its owners. In the 2024 letter Buffett repeats Tom Murphy's rule: "Praise by name, criticize by category." Murphy gave him this advice 60 years ago. The letter closes with another line worth keeping: "Kindness is costless but priceless." Key findings: - Berkshire paid $26.8B in federal income taxes last year, roughly 5% of all U.S. corporate tax receipts, built on an insurance float engine that generated $9B in underwriting profit and $13.7B in investment income in 2024 - Between 2019 and 2023, Buffett used 'mistake' or 'error' 16 times in shareholder letters, while many Fortune 500 companies never used either word once - Apple at its peak represented 40-50% of Berkshire's public equity portfolio, meaning one stock bought mostly between 2016-2018 drove a substantial share of decade-long returns - Abel inherits roughly $300B in cash and Treasury bills, not because Buffett prefers cash but because nothing available meets Berkshire's price discipline at current valuations ### [How AI is Shaping My Investment Portfolio for 2026](https://philippdubach.com/posts/how-ai-is-shaping-my-investment-portfolio-for-2026/) Published: 2025-12-12 Description: Rebalancing for 2026: reducing S&P 500 at 40× CAPE, adding Europe after Germany's €1T pivot, and bonds at 4.2% yields. Full allocation rationale. Summary: This essay describes how I am thinking about my own personal long-term portfolio heading into 2026. It is a journal of how I structured my own allocation given the macro context — not a recommendation for any reader to take any action. I separately run a satellite book of short-term option plays and individual stocks; this essay does not cover that. The essay is structured along five themes I currently weigh: Key findings: - The S&P 500's Shiller CAPE ratio sits at 40.5, more than double its historical mean of 17.3, with the top 10 companies representing 45% of the index. - Germany's historic fiscal pivot commits over 1 trillion euros to infrastructure and defense, narrowing the US-Europe growth gap from 60bps to 30bps while European equities trade at a 22% discount to global peers. - AI capex is projected to reach $1.3 trillion by 2030 (3.8% of US GDP), exceeding every prior infrastructure boom including broadband, electricity, and the Apollo program. - The US dollar remains roughly 10% overvalued per JP Morgan, with its reserve share declining from 71% in 1999 to 56% in 2025, supporting gold and currency-hedged international exposure. ### [Not Logan Roy: Netflix vs. Paramount's Bidding War](https://philippdubach.com/posts/not-logan-roy-netflix-vs.-paramounts-bidding-war/) Published: 2025-12-09 Description: Netflix's $72B Warner Bros deal vs Paramount's hostile $30/share tender. Deal mechanics, aggregation theory, and why internet distributors win streaming. Summary: In the HBO series Succession, billionaire Logan Roy's children spent four seasons scheming, backstabbing, and making offers to inherit a media empire. This week, the real version played out with more zeros and a $252 billion Oracle stake. Time for a closer look: On Friday, Warner Bros. Discovery's board agreed to sell the company to Netflix for $72 billion. By Monday, Paramount had launched a hostile tender offer directly to shareholders at $30 per share, all cash. In this post I will be going into the gap between those two numbers, streaming economics, aggregator theory, and hostile deal mechanics. × The Netflix offer breaks down into three pieces: $23.25 per share in cash, $4.50 per share in Netflix stock subject to a collar, and shares in a spun-off entity called Discovery Global containing CNN and the cable networks that Netflix doesn't want. Analysts value that stub somewhere between $2 and $5 per share, which puts the total package at roughly $29.75 to $32.75. Paramount is offering $30 per share in cash for the entire company, including the cable assets. Warner's stock closed Friday at $26.08 and opened Monday around $27.64, which tells you the market expects a bidding war but isn't fully convinced either deal closes. Key findings: - Netflix is acquiring Warner Bros for $72B in equity value while Paramount launched a hostile $30/share all-cash tender offer, backed by $54B in debt and Larry Ellison's $252B Oracle stake. - Netflix commands a $425B market cap because internet distribution has zero marginal cost, while combined legacy studios are worth a fraction, a textbook case of aggregation theory. - If Warner shareholders take Paramount's offer, Warner owes Netflix a $2.8B breakup fee; if Netflix's deal collapses, Netflix owes $5.8B, one of the largest reverse breakup fees on record. ### [Nike's Crisis and the Economics of Brand Decay](https://philippdubach.com/posts/nikes-crisis-and-the-economics-of-brand-decay/) Published: 2025-12-02 Description: Nike lost $28B by weakening product development, athlete partnerships, and marketing simultaneously. Data-driven analysis of how complementary assets collapse. Summary: Nike's $28 Billion Value Destruction In July 2024, Scott Galloway argued on his podcast, in his own words, that Nike's then-CEO had organisational and personnel problems at the head of "the strongest brand or one of the strongest brands in consumer history." The quote is Galloway's opinion; I cite it not as my own characterisation but because the underlying critique (Nike's organisational structure under that period of leadership) is one I will examine on its own evidence below. In March 2025, Nike reported its worst revenue decline in nearly five years: an 11.5% drop to $11.01 billion. Digital sales fell 20%, app downloads decreased 35%, and store foot traffic declined 11%. Nike's crisis reveals how competitive advantages work, and how quickly they can disappear when the company that once captured roughly half of the US athletic footwear market systematically weakens its own foundations. Key findings: - Nike terminated hundreds of wholesale accounts to capture 50% direct margins over 30-35% wholesale margins, but competitors On and Hoka immediately filled the vacated shelf space and grew from a combined $682M to $3.2B in revenue between 2020 and 2025. - On grew from $330M to $1.8B revenue and Hoka from $352M to $1.4B between 2020 and 2025, exploiting the product and distribution gap Nike created under CEO John Donahoe. - Nike's gross margins compressed 190 basis points to 42.7% as the direct-to-consumer shift, organizational restructuring, and marketing pivot destroyed three complementary assets simultaneously. - Nike manufactures 95% of shoes in Southeast Asia and faces $1B to $1.5B in additional tariff costs, compounding a crisis rooted in strategy rather than trade policy. ### [Michael Burry's $379 Newsletter](https://philippdubach.com/posts/michael-burrys-379-newsletter/) Published: 2025-11-28 Description: Michael Burry launches Substack warning AI markets mirror 1999. His Nvidia-Cisco comparison, the GPU depreciation debate, and what hyperscalers need to justify capex. Summary: Michael Burry (who in your head probably looks like Christian Bale thanks to The Big Short), the investor who famously predicted the 2008 housing crash, has launched a Substack newsletter after deregistering his hedge fund. The $379 annual subscription capitalizes on the 1.6 million followers he's built on X, offering what he describes as his "sole focus" going forward. The newsletter's inaugural post takes (which he kindly enough made accessible for free as a Thanksgiving gift today) readers back to 1999, when Burry was a 27-year-old neurology resident at Stanford making $33'000 annually while carrying $150'000 in medical school debt. There he wrote his Valuestocks.net article "Buffett Revisited". A fellow resident casually mentioned making $1.5 million on Polycom stock. Physicians crowded around terminals checking stocks while patients waited. In that environment, Burry was writing investment analysis late at night, getting paid $1 per word by MSN Money under the pen name "Value Doc." His VSN Fund returned 68.1% in 1999, and by February 2000, the San Francisco Chronicle noted he had shorted Amazon. Fourteen days after that article appeared, the NASDAQ topped. It was a peak it wouldn't revisit for 15 years. Key findings: - Burry argues Nvidia is Cisco circa 2000: the picks-and-shovels supplier at the centre of an infrastructure cycle built on demand forecasts that may not materialise. Nvidia has publicly disputed his characterisation. - Burry contends, and Nvidia disputes, that hyperscalers depreciate GPUs over longer useful lives than the underlying obsolescence cycle. Whether his framing or Nvidia's is correct is an unresolved accounting debate. - As a hypothetical sensitivity exercise on Alphabet's roughly $90B AI capex guidance, 5-year depreciation plus a 10% WACC implies the company would need ~$40B per year in incremental AI-attributable revenue to clear the hurdle at a 70% blended margin. This is one possible 'if X then Y' arithmetic exercise, not an accounting allegation. - One key difference from 2000: Cisco's forward P/E was around 200 at its peak, while Nvidia's is under 40 ### [Everything is a DCF Model](https://philippdubach.com/posts/everything-is-a-dcf-model/) Published: 2025-10-19 Description: Michael Mauboussin's argument that every cash-generating asset is valued through a DCF model. Why this Morgan Stanley paper changed how I think about value. Summary: A brilliant piece of writing from Michael Mauboussin and Dan Callahan at Morgan Stanley that was formative in what I personally believe when it comes to valuation. Related Variance Tax […] we want to suggest the mantra "everything is a DCF model." The point is that whenever investors value a stake in a cash-generating asset, they should recognize that they are using a discounted cash flow (DCF) model. […] The value of those businesses is the present value of the cash they can distribute to their owners. This suggests a mindset that is very different from that of a speculator, who buys a stock in anticipation that it will go up without reference to its value. Investors and speculators have always coexisted in markets, and the behavior of many market participants is a blend of the two. Key findings: - Mauboussin's core claim: every valuation of a cash-generating asset is implicitly a DCF model, whether the investor builds one explicitly or not. - The framework draws a clean line between investors (who price future cash flows) and speculators (who buy in anticipation of price increases without reference to value). - DCF does not apply to gold, art, or crypto because these assets produce no cash flows, meaning their prices are set purely by supply and demand. ### [Crypto Mean Reversion Trading](https://philippdubach.com/posts/crypto-mean-reversion-trading/) Published: 2024-11-11 Description: How I built a crypto mean reversion trading bot using PELT change point detection on Kraken, targeting altcoin price overreactions with automated execution. Summary: In late 2021, Lars Kaiser's paper on seasonality in cryptocurrencies inspired me to use my Kraken API Key to try and make some money. A quick summary of the paper: (1) Kaiser analyzes seasonality patterns across 10 cryptocurrencies (Bitcoin, Ethereum, etc.), examining returns, volatility, trading volume, and spreads (2) Finds no consistent calendar effects in cryptocurrency returns, supporting weak-form market efficiency (3) Observes robust patterns in trading activity - lower volume, volatility, and spreads in January, weekends, and summer months (4) Documents significant impact of January 2018 market sell-off on seasonality patterns (5) Reports a "reverse Monday effect" for Bitcoin (positive Monday returns) and "reverse January effect" (negative January returns) (6) Trading activity patterns suggest crypto markets are dominated by retail rather than institutional investors. Key findings: - The bot bought altcoins on Kraken when prices dropped more than 4 standard deviations over a 2-hour window, then sold automatically after 2 hours, betting on mean reversion - PELT change point detection identified structural breaks in ETH price series, providing signal confirmation for when statistical properties of the time series shifted - Major cryptos like BTC and ETH are becoming more efficient, but smaller altcoins with thin order books and retail-dominated trading still exhibit exploitable mean reversion patterns ### [My First 'Optimal' Portfolio](https://philippdubach.com/posts/my-first-optimal-portfolio/) Published: 2024-03-15 Description: How I built Python portfolio optimization tools, tripled the Sharpe ratio from 0.65 to 1.68, and published the results as an academic paper on MPT. Summary: My introduction to quantitative portfolio optimization happened during my undergraduate years, inspired by Attilio Meucci's Risk and Asset Allocation and the convex optimization teachings of Diamond and Boyd at Stanford. With enthusiasm and perhaps more confidence than expertise, I created my first "optimal" portfolio. What struck me most was the disconnect between theory and accessibility. Modern Portfolio Theory had been established since 1990, yet the optimization tools remained largely locked behind proprietary software. Key findings: - Mean-variance optimization tripled the Sharpe ratio from 0.65 to 1.68 while cutting volatility from 14.4% to 5.6% at the same 9.4% return - Out-of-sample testing across the 2018 bear market and 2019 bull market showed consistent CVaR reduction and improved risk-adjusted returns - The project was published as an academic paper to fill the gap between established MPT theory and the lack of accessible open-source Python optimization tools at the time ## Quantitative Finance (10 articles) ### [The Anatomy of a Decentralized Prediction Market: Notes from the Polymarket Order Book](https://philippdubach.com/posts/the-anatomy-of-a-decentralized-prediction-market-notes-from-the-polymarket-order-book/) Published: 2026-05-02 Description: A 624 GB tick-level archive of Polymarket's WebSocket feed joined to the on-chain trade record reveals eight cross-sectional stylized facts and a measurement result: the public feed only recovers trade direction 59% of the time. Summary: I spent the last two months running a Polymarket order-book collector. The collector runs on a small VM, subscribes to the public WebSocket feed, and writes one Parquet file per UTC hour. By 2026-04-15 the archive had grown to 1,262 hourly files, 30,287,264,368 events, 623.8 GB on disk, covering 52 calendar days and 385,198 distinct market ids. The first version of the paper is up on arXiv, the replication package is on GitHub and Zenodo (DOI 10.5281/zenodo.19811426), and the manuscript is under review at the Journal of Financial Markets. Key findings: - Polymarket's lowest-probability decile carries a 650-900 bps half-spread, an order of magnitude wider than US equities, and looks more like a liquidity-provision constraint than a behavioral longshot bias. - Trade direction inferred from Polymarket's public WebSocket feed agrees with the on-chain OrderFilled record only 59% of the time, barely above the 50% chance baseline and 22 points below Lee-Ready accuracy on Nasdaq. - On the top-100 stratum, the effective half-spread changes sign between feed-inferred and on-chain trade directions in 67% of markets, and Kyle's lambda flips sign in 60%, so any direction-dependent measure must be sourced on-chain. - The pre-registered 600-market panel covers 30.3 billion order-book events across 52 days joined to 255 million on-chain OrderFilled events, with the full pipeline released as a Zenodo replication package. ### [The Moral Philosophy of Investing in Ignorance](https://philippdubach.com/posts/the-moral-philosophy-of-investing-in-ignorance/) Published: 2026-04-22 Description: Constraint arbitrage, the sidecar problem, and who bears the distributional cost of investing under ignorance. The final installment of Edge of Knowledge. Summary: × Investing at the Edge of Knowledge, Part 5 · Start with Part 1 Key findings: - Most alpha in UU situations is constraint arbitrage: profiting from the gap between institutional rationality (right for the manager's career) and market rationality (right price for the asset). - Returns from UU mispricing flow disproportionately to wealthy individuals and family offices because institutional governance requires probability estimates the ignorance box can't produce. - Zeckhauser's sidecar concept is ethically clean when the driver has capability (a developer building a property) but murkier when the complementary asset is political power rather than skill. - Munger's compliment ('think the way Zeckhauser plays bridge') captures the series thesis: the best investors reason about what they don't know, not what they do. ### [Bet Sizing at the Frontier](https://philippdubach.com/posts/bet-sizing-at-the-frontier/) Published: 2026-04-17 Description: The Kelly Criterion assumes you know your probability of winning. In a UU world, you don't, and heuristics like Zeckhauser's Maxim B replace false precision. Summary: × Investing at the Edge of Knowledge, Part 4 · Start with Part 1 Key findings: - Kelly's formula (f = (bp - q) / b) maximizes geometric growth rate but requires knowing your probability of winning, an input that is undefined in Zeckhauser's ignorance box. - Samuelson's 1979 critique (written entirely in one-syllable words) showed Kelly only maximizes expected utility for log-utility investors, meaning it systematically overbets for anyone more risk-averse. - Renaissance's Medallion Fund returned roughly 66% annually before fees from 1988 to 2021, applying Kelly-based position sizing to thousands of short-duration trades where probabilities were estimable. - Zeckhauser's Maxim B replaces formula-based precision with a judgment-based heuristic: bet proportionally to your edge, and if nothing looks foolish after the fact, you were too cautious. ### [The Geometry of Who Knows What](https://philippdubach.com/posts/the-geometry-of-who-knows-what/) Published: 2026-04-13 Description: When neither side can define the states of the world, adverse selection fears are misplaced. Zeckhauser's information matrices and constraint arbitrage. Summary: × Investing at the Edge of Knowledge, Part 3 · Start with Part 1 Key findings: - Zeckhauser's information matrices distinguish when the other side knows more (danger) from when ignorance is shared (opportunity), and most investors assume the wrong box. - Wall Street rejected Buffett's $1B California earthquake reinsurance not because they feared adverse selection but because their compliance models required probability estimates that didn't exist. - Constraint arbitrage, profiting from the gap between what an asset is worth and what institutions can hold, is a permanent structural feature of UU markets, not a temporary inefficiency. - The sidecar concept (investing alongside skilled operators) relocates the evaluation problem from asset selection to manager selection, which Summers and Robb argue may be equally hard. ### [Ambiguity by Design](https://philippdubach.com/posts/ambiguity-by-design/) Published: 2026-04-08 Description: Ellsberg proved people flee unknown odds. Zeckhauser showed their flight creates mispricing. Part 2 on ambiguity aversion, comparative ignorance, and investing. Summary: × Investing at the Edge of Knowledge, Part 2 · Start with Part 1 Key findings: - Ellsberg's 1961 experiment proved people prefer a known 50/50 bet over unknown odds they can take either side of, fleeing the feeling of not-knowing rather than any real informational disadvantage. - Fox and Tversky (1995) found ambiguity aversion intensifies when subjects compare their knowledge to someone who appears more informed, a condition permanently active in financial markets. - The IGV's $2 trillion sell-off was ambiguity aversion amplified by career risk, compliance constraints, and fiduciary duty, not a response to information about fundamentals. - Zeckhauser argues your discomfort facing an ambiguous asset tells you nothing about the asset but everything about the competitive field, because other buyers already left. ### [Three Kinds of Not-Knowing](https://philippdubach.com/posts/three-kinds-of-not-knowing/) Published: 2026-04-04 Description: Knightian uncertainty splits not-knowing into risk, uncertainty, and ignorance. A century after Knight and Keynes, most of investing still ignores the split. Summary: × Investing at the Edge of Knowledge, Part 1 Key findings: - Zeckhauser's 2006 framework splits not-knowing into risk (known distributions), uncertainty (unknown probabilities), and ignorance (undefined states): most of finance covers only the first. - Buffett wrote a $1.5B California earthquake reinsurance policy that the capital markets couldn't place because institutional models required probability estimates nobody had. - The IGV software ETF fell 32% while sector earnings grew 17% in early 2026 because investors couldn't define the possible states of AI disruption, not just estimate their probabilities. - Knight and Keynes independently argued in 1921 that not all uncertainty reduces to calculable probability, but the discipline chose formalization and both arguments lost for a century. ### [Long Volatility Premium](https://philippdubach.com/posts/long-volatility-premium/) Published: 2026-02-14 Description: One River's data shows beta-adjusted long volatility outperformed the S&P 500 over 40 years. Goldman, AQR, and Universa agree on the mechanism but disagree on implementation. A synthesis of the evidence. Summary: The real value of tail hedging is not in the hedge itself. It's in what the hedge enables. In The Variance Tax I wrote about the ½σ² formula: compound returns equal arithmetic returns minus half the variance, and because the penalty is quadratic, large drawdowns destroy wealth in ways that are hard to recover from. A portfolio that falls 50% needs 100% just to break even. That piece was about the problem. This one is about a potential solution, and about whether paying for crash protection can actually improve total returns rather than drag them. Key findings: - One River's 40-year data shows a beta-adjusted long volatility overlay improved S&P 500 total returns while reducing drawdowns, because neutralizing the put's short-delta isolates convexity that pays off in crashes - A 3.3% allocation to Universa with the rest in the S&P 500 compounded at 12.3% annually over 10 years, beating the index by truncating the variance tax on compound returns - AQR finds puts and trend-following are complementary: puts returned over 42% during the sudden COVID crash while trend-following excelled in protracted bear markets like the dot-com bust - Several popular tail-risk strategies including short-dated VIX futures underperformed a simple cash allocation by 355 basis points, proving implementation matters more than the concept ### [Variance Tax](https://philippdubach.com/posts/variance-tax/) Published: 2026-02-06 Description: Variance drain is the hidden cost of volatility: why a portfolio averaging +10% can lose money. The ½σ² formula explains the gap between paper and real returns. Summary: Let's say your portfolio returned +60% in 2024, then fell 40% in 2025. That's an annualized average return of +10%. Actual return after two years: minus 4% (i.e $100 * 1.6 * 0.6 = $96). That 14-point gap is what we call the variance tax aka variance drain or volatility drag and it's one of the least intuitive forces in investing. Take any series of returns with arithmetic mean μ and volatility σ. The compound growth rate, the one that actually determines your wealth, is approximately: Key findings: - Variance drain equals ½σ²: doubling volatility quadruples the cost to compound returns - The Kelly criterion (L* = (μ-r)/σ²) falls directly out of the variance drain formula, giving the leverage that maximizes compound growth - Half-Kelly sizing sacrifices ~25% of theoretical growth but dramatically reduces drawdown risk from estimation error - Same 10% arithmetic return at 50% vol loses more than half your money over 30 years; at 0% vol it reaches $1,745 ### [Is Private Equity Just Beta With a Lockup?](https://philippdubach.com/posts/is-private-equity-just-beta-with-a-lockup/) Published: 2026-01-29 Description: AQR's 2026 data shows private equity returning 4.2% versus 3.9% for public equities. The 30bp illiquidity premium barely justifies years of lockup. Summary: The pitch used to be simple: accept illiquidity, get rewarded. Lock up your capital for seven years, tolerate capital calls and J-curves, and in exchange you'd earn returns that public markets couldn't touch. It was the defining bargain of institutional investing for two decades. AQR's latest capital market assumptions make for uncomfortable reading if you're an allocator to private markets. Their expected real return for U.S. buyouts over the next 5-10 years is 4.2%. For U.S. large cap public equities, it's 3.9%. That's a 30 basis point premium for accepting years of lockup, unpredictable capital calls, limited transparency, and the very real risk of picking the wrong manager. × Private credit looks even worse. Expected returns dropped 0.5 percentage points year over year as spreads narrowed and base rates came down. The asset class that was supposed to be the sensible alternative to stretched equity valuations now offers less compensation than it did twelve months ago. Key findings: - AQR's 2026 assumptions show U.S. buyouts returning 4.2% versus 3.9% for public equities, a 30bp premium that barely justifies years of lockup and manager selection risk - Venture capital dispersion is extreme: top decile managers earn 31.7% IRR while bottom decile return negative 7%, meaning average returns compress as capital floods in - 87% of U.S. companies with over $100M in revenue are now private, and 55% of median value for 2020-2023 IPOs was created before going public, up from 12% for 2014-2019 IPOs - Private credit expected returns dropped 0.5 percentage points year over year to 2.6%, offering less compensation than twelve months ago as spreads narrowed ### [Against All Odds: The Mathematics of 'Provably Fair' Casino Games](https://philippdubach.com/posts/against-all-odds-the-mathematics-of-provably-fair-casino-games/) Published: 2026-01-25 Description: Statistical analysis of 20,000 crash game rounds verifies the 97% RTP claim. But 179 rounds per hour means expected losses exceed 500% of wagers hourly. Summary: Gambling can be harmful and lead to significant losses. Participation is subject to local laws and age restrictions. Always gamble responsibly. Need help? Visit BeGambleAware.org Crash games represent a category of online gambling where players place bets on an increasing multiplier that can 'crash' at any moment. The fundamental mechanic requires players to cash out before the crash occurs; successful cash-outs yield the bet amount multiplied by the current multiplier, while failure results in total loss of the wager. Key findings: - Statistical analysis of 20,000 crash game rounds confirms the 97% RTP claim: the estimated probability exponent is 1.98 versus a theoretical 2.0, within 2.2% accuracy - At 179 rounds per hour with 16-second median intervals and a 3% house edge per round, players face expected losses exceeding 500% of amounts wagered per hour - Monte Carlo simulations of 10,000 sessions across four strategies (1.5x to 5x cash-outs) confirm every single strategy produces negative expected returns - The probability of reaching multiplier m before crashing equals 0.97/m, so a 2x target succeeds 48.5% of the time while 100x works just 1.1% of rounds ## Macro (8 articles) ### [People Live in Levels, Not Rates](https://philippdubach.com/posts/people-live-in-levels-not-rates/) Published: 2026-02-28 Description: Prices rose 25% since 2020 and won't come back. The levels-vs-rates problem explains the vibecession, the Stewart-Thaler debate, and why nobody trusts economists. Summary: Economics doesn't take into account what's best for society. The goal of economics in a capitalist system is to make the most amount of money for your shareholders. That's Jon Stewart, during his February 4 conversation with a Nobel laureate about behavioral economics. Stewart hosted Richard Thaler on "The Weekly Show" for 92 minutes. Thaler, the Chicago Booth professor who won the 2017 Nobel for his work on how real humans deviate from rational-agent models, spent much of the conversation, in my reading, explaining definitional points Stewart's framing had not absorbed. Jason Furman, Harvard professor and former Obama CEA chair, called it in a widely-shared tweet "the single worst interview I've ever done" (referencing his own 2024 Stewart appearance). Jerusalem Demsas wrote a critical rebuttal arguing Stewart's framing did not match what economics as a discipline actually claims. Key findings: - Cumulative CPI is up ~25% since 2020 (groceries up 29.4%, housing up 30-45%) and none of it reverses when inflation falls to 2.4% - Bottom-quartile wage growth collapsed from 7.5% to 3.5% in 2025, reversing pandemic-era compression that had closed a third of the post-1979 inequality gap - Consumer sentiment is at the 3rd percentile of its historical range despite 4.4% GDP growth, plausibly a measurement gap rather than only sentiment - Stewart's proposed mechanism on Thaler's show resembled the carbon tax Thaler had outlined moments earlier, an illustration of the economics communication problem more than a finished economic argument ### [Europe's $24 Trillion Payment Breakup Is Really a Bet on Infrastructure Arbitrage](https://philippdubach.com/posts/europes-24-trillion-payment-breakup-is-really-a-bet-on-infrastructure-arbitrage/) Published: 2026-02-16 Description: The EuroPA alliance connected 130 million users across 13 countries overnight. But this isn't really about sovereignty. It's an infrastructure arbitrage exploiting a 100-120bps spread between card network fees and SEPA Instant rails, accidentally protected by the EU's own regulation. Summary: On February 2, 2026, the European Payments Initiative signed a Memorandum of Understanding with the Alliance EuroPA, a consortium linking Spain's Bizum, Italy's Bancomat, Portugal's SIBS, and the Nordic Vipps MobilePay system. The deal connects 130 million users across 13 countries into a single interoperable payment network. Headlines framed it as Europe breaking up with Visa and Mastercard. The actual story is more interesting: Europe is attempting an infrastructure arbitrage that, if it works, could reprice how money moves across the continent. Key findings: - The EuroPA alliance connected 130 million users across 13 countries overnight, giving Wero the scale to challenge Visa and Mastercard's $4.7 trillion European transaction volume - Card transactions cost European merchants up to 2% versus Wero's proposed 0.77%, a 100-120 basis point structural arbitrage because account-to-account payments skip the card network layer entirely - The EU's 2015 interchange cap backfired: Visa and Mastercard shifted revenue to unregulated scheme fees that rose 33.9% between 2018 and 2022, nearly doubling the net merchant service charge - Mastercard has over 900 million branded cards in EU circulation versus Wero's 47 million users, and German adoption sits at only 5% of transaction volume despite being the first launch country ### [Britain's Strategic Limbo](https://philippdubach.com/posts/britains-strategic-limbo/) Published: 2026-01-28 Description: Britain faces strategic isolation: locked out of EU defense cooperation, unwilling to join Trump's coalition. The mid-Atlantic bridge has nowhere to land. Summary: The UK is the country with no bloc. At Davos, Britain refused to join Trump's Board of Peace, citing commitment to international law and rejection of the "pay-to-play" model. France, Germany, Sweden, Norway made the same choice. The difference is that those countries have somewhere else to go. Britain doesn't. The SAFE instrument, the EU's €150 billion fund for joint defense procurement, is designed explicitly for strategic autonomy. Strict "Buy European" provisions limit non-EU subcontractors to 15-35% of contract value, phased out within two years. Canada, remarkably, negotiated access and now has preferential treatment on par with EU firms. The UK remains excluded. Key findings: - The EU's SAFE fund limits non-EU subcontractors to 15-35% of contract value, and the UK rejected participation over sovereignty concerns that mirror the logic of Brexit itself - Canada negotiated SAFE access on par with EU firms while Britain remains excluded, illustrating that principles without alternatives is just isolation - Procurement cycles last decades, so structural exclusion from European defense contracts now means the UK defense industrial base erodes with each passing year ### [The Rise of Middle Power Realism](https://philippdubach.com/posts/the-rise-of-middle-power-realism/) Published: 2026-01-27 Description: At Davos 2026, Carney told allies to take down the signs of the liberal order. Middle powers are learning to navigate between giants without illusions. Summary: At Davos 2026, Canadian Prime Minister Mark Carney delivered a speech that received something rare at these gatherings: a standing ovation. Carney told the assembled elites what they already knew but hadn't said aloud: the world is not in a "transition" but a "rupture." The speech drew on Václav Havel's 1978 essay The Power of the Powerless, specifically the parable of the greengrocer who displays the slogan "Workers of the World, Unite!" in his shop window. The grocer doesn't believe the slogan. He displays it to signal submission, to live in harmony with the regime. Carney's application was pointed: for years, US allies have displayed the signs of the liberal international order, pretending the partnership was mutual, that rules mattered, that values were shared. Even as reality diverged. Key findings: - At Davos 2026, Carney declared the world has experienced 'a rupture, not a transition' and used Havel's greengrocer parable to argue that allies have been displaying signs of a liberal order that no longer exists - Canada joined the EU's SAFE defense fund, a 150 billion euro procurement program, becoming the first non-European G7 nation with preferential access to European defense markets - Canada secured a preliminary trade deal with China on 49,000 EVs at 6.1% tariff, compared to the 100% tariff the U.S. imposes, demonstrating the leverage that comes from diversified partnerships - The EU threatened to deploy its Anti-Coercion Instrument against the United States during the Greenland crisis, the first time the bloc signaled willingness to trade-war its primary security guarantor ### [Big in Japan](https://philippdubach.com/posts/big-in-japan/) Published: 2026-01-19 Description: Japan holds $5 trillion in foreign assets. With 30-year JGB yields now above 3%, the carry trade that defined Japanese investing faces new friction. Summary: Japan holds roughly $5 trillion in foreign assets. The US alone accounts for ¥342 trillion in bonds and equities. Japanese 30-year yields sat below 1% from 2019 through early 2024. They're now above 3%. The yield spread between developed market bonds and JGBs has collapsed from 400 basis points to roughly 100. The yen carry trade that defined Japanese institutional behavior since the 1990s, borrow cheap at home and invest abroad for yield, suddenly has added friction. Key findings: - Japan holds roughly $5 trillion in foreign assets and is the largest foreign holder of U.S. Treasuries at over $1.1 trillion - Japanese 30-year government bond yields rose from below 1% through early 2024 to above 3%, collapsing the yield spread versus developed market bonds from 400 basis points to roughly 100 - The August 2024 yen carry trade unwind dropped the S&P 6% in three days, and that was just positioning adjustment, not actual repatriation of Japan's institutional foreign holdings - Treasury market depth has deteriorated since 2020, meaning a sustained seller of size would arrive into a market less equipped to absorb flow than at any point since the GFC ### [Repo might be even bigger than we thought](https://philippdubach.com/posts/repo-might-be-even-bigger-than-we-thought/) Published: 2026-01-13 Description: New OFR data reveals $12.6 trillion in daily repo exposures—$700 billion larger than previous estimates. The plumbing of modern money remains poorly understood. Summary: Finance is anthropological That's Zoltan Pozsar, the Hungarian-American economist who mapped the plumbing of modern money before most people knew there was plumbing to map. When he said it to Bloomberg in 2019, he was trying to explain why repo markets (the overnight lending infrastructure that lubricates trillions in daily transactions) had just seized up in ways the Federal Reserve didn't anticipate. I've written about Pozsar's work before, particularly his "Bretton Woods III" thesis about the shifting role of the dollar. But his earlier research on shadow banking and repo markets feels increasingly relevant as we enter 2026. In December 2025, the Office of Financial Research published new data on the size of the U.S. repo market. The number: $12.6 trillion in average daily exposures. That's roughly $700 billion larger than previous estimates; a measurement error roughly the size of the entire Swiss banking system. Key findings: - New OFR data puts the U.S. repo market at $12.6 trillion in daily exposures, $700 billion larger than previous estimates, a measurement gap roughly the size of the Swiss banking system - Bilateral repo accounts for $5 trillion of daily activity, roughly 40% of the market, and was essentially invisible to regulators until OFR transaction-level collection reached full implementation in July 2025 - The Fed's Standing Repo Facility hit record usage of $74.6B on December 31, 2025, while reserves fell to $2.8 trillion, their lowest in four years - Only 61.8% of repo collateral is Treasuries, leaving substantial room for corporate bonds and agency MBS that can gap in value during stress ### [Pozsar's Bretton Woods III: Three Years Later [2/2]](https://philippdubach.com/posts/pozsars-bretton-woods-iii-three-years-later-2/2/) Published: 2025-10-26 Description: Gold above $4,000, Treasury holdings below $7T, but the dollar still dominates 88% of FX volumes. What Pozsar's Bretton Woods III got right and wrong. Summary: Start by reading Pozsar's Bretton Woods III: The Framework [1/2] Now, what actually happened in the three years since Pozsar published the Bretton Woods III framework? (1) Dollar reserve diversification is happening, but gradual: Foreign central bank Treasury holdings declined from peaks exceeding $7.5 trillion to levels below $7 trillion. This represents steady diversification away from dollar-denominated assets, though not a dramatic collapse. (2) Gold has performed strongly: From roughly $1'900/oz when Pozsar published his dispatches to peaks above $4'000/oz today, gold has appreciated substantially, consistent with increased central bank gold buying and demand for "outside money." (3) Alternative payment systems are developing: Various nations continue building infrastructure for non-dollar trade settlement. While these systems remain in preliminary stages rather than fully operational alternatives to SWIFT, development timelines could speed up following specific triggering events. (4) The dollar itself has remained strong: Perhaps surprisingly given predictions of reserve currency decline, the dollar achieved its best performance against a basket of major currencies since 2015 in 2024. The DXY index (which tracks the dollar against major trading partners) fell about 11% this year, marking the end of this decade-long rally. (5) Commodity collateral is increasingly important: Research on commodities as collateral shows that under capital controls and collateral constraints, investors import commodities and pledge them as collateral. Higher collateral demands increase commodity prices and affect the inventory-convenience yield relationship. Key findings: - Foreign central bank Treasury holdings fell from $7.5T to below $7T while gold rose from $1,900 to above $4,000/oz, consistent with gradual reserve diversification away from dollar assets. - Foreign ownership of U.S. Treasuries dropped from above 50% to 30%, but the dollar still dominates 88% of FX volumes, showing de-dollarization is real but slow. - The dollar posted its best year since 2015 in 2024 before declining sharply in 2025, complicating any simple narrative of dollar collapse or reserve currency decline. - Pozsar's most durable insight: central banks control the nominal domain but not the real domain, meaning supply-driven commodity inflation does not respond well to rate hikes. ### [Pozsar's Bretton Woods III: The Framework [1/2]](https://philippdubach.com/posts/pozsars-bretton-woods-iii-the-framework-1/2/) Published: 2025-10-25 Description: How freezing Russian reserves sparked Bretton Woods III: Pozsar's framework on inside money, outside money, and the shift to a commodity-backed monetary order. Summary: In March 2022, as Western nations imposed unprecedented sanctions following Russia's invasion of Ukraine, Zoltan Pozsar published a series of dispatches that would become some of the most discussed pieces in financial markets that year. The core thesis was stark: we were witnessing the birth of "Bretton Woods III," a fundamental shift in how the global monetary system operates. Nearly three years later, with more data on de-dollarization trends, commodity market dynamics, and structural changes in global trade, it's worth revisiting this framework. Key findings: - Freezing Russian reserves in 2022 introduced confiscation risk to assets previously considered risk-free, triggering a shift from inside money (Treasuries) to outside money (commodities, gold). - Non-U.S. banks hold $16 trillion in dollar assets but lack access to the Fed's emergency facilities, creating structural vulnerability whenever dollars become scarce globally. - Rerouting Russian oil from 2-week Baltic-to-Europe voyages to 4-month Asia routes tied up roughly 10% of global VLCC capacity and multiplied commodity financing demands. - Perry Mehrling's four prices of money (par, interest, exchange rate, price level) form Pozsar's analytical backbone, with central banks able to manage the first three but not commodity-driven inflation. ## Economics (5 articles) ### [When AI Labs Become Defense Contractors](https://philippdubach.com/posts/when-ai-labs-become-defense-contractors/) Published: 2026-03-01 Description: The Anthropic-Pentagon standoff isn't an ethics story. It's a replay of the 1993 Last Supper that consolidated 51 defense primes into 5, at Silicon Valley speed. Summary: Lockheed started by building Amelia Earhart's favorite plane. Then came a government loan guarantee in 1971 (the L-1011 TriStar nearly killed the company), a Cold War, decades of consolidation, and now a business that earns 92.5% of its revenue from government contracts, with the F-35 alone accounting for 26% of its $71 billion in annual sales. The process took about 50 years. AI labs becoming defense contractors will happen faster. On February 27, 2026, two things happened within hours of each other. President Trump ordered every federal agency to "IMMEDIATELY CEASE all use of Anthropic's technology" after CEO Dario Amodei refused to strip safety constraints from Claude's Pentagon deployment, specifically prohibitions on mass domestic surveillance and fully autonomous weapons. Defense Secretary Pete Hegseth then labeled Anthropic a "Supply-Chain Risk to National Security," a designation previously reserved for foreign adversaries like Huawei, never before applied to an American company. That evening, Sam Altman announced that OpenAI had signed a deal to deploy its models on the Pentagon's classified network, posting that the Department of War "displayed a deep respect for safety." (Whether that reflects the Pentagon's actual position or Altman's political optimism, remains unclear for now.) Key findings: - The FY2026 Pentagon AI budget jumped to $13.4 billion from $1.8 billion, a 7x increase in a single budget cycle, now larger than Anthropic's entire annualized revenue of $14 billion. - After the 1993 Last Supper, 51 prime defense contractors collapsed into 5 within four years. AI labs face the same consolidation logic, just faster: through classified network access and government-funded compute rather than M&A. - IDIQ contracts account for 56% of DoD award dollars and run five years with extensions. Once embedded in classified systems with a security-cleared workforce (243-day average clearance processing), switching costs become close to prohibitive. - Palantir's trajectory previews the endgame: $4.48 billion FY2025 revenue (up 56%), 53.7% from government, now worth nearly twice Boeing at $320 billion market cap. ### [Economics of a Super Bowl Ad](https://philippdubach.com/posts/economics-of-a-super-bowl-ad/) Published: 2026-02-20 Description: A 30-second Super Bowl spot costs $8M. The real price is $16–23M. The ROI evidence is mixed. A deep look at the pricing, the prisoner's dilemma, and the NFL. Summary: A 30-second Super Bowl ad costs $8 million. That's $267,000 per second, roughly the median U.S. home price for every tick of the clock. Super Bowl LX drew 124.9 million average viewers with a peak of 137.8 million, the highest peak audience in American television history. The NFL accounted for 84 of the top 100 most-watched U.S. telecasts in 2025. The Oscars, by comparison, managed 19.7 million. Ro (that's the name of the direct-to-patient telehealth company) CEO Zachariah Reitano, writing from direct experience as a 2026 Super Bowl advertiser, published a detailed cost breakdown based on his own spending and interviews with 10+ brands. The picture that emerges is considerably more expensive than the headline number. Production runs $1–4 million for studio, crew, and post-production before any famous face enters the frame. Celebrity endorsement talent adds $1–5 million, with the current A-list sweet spot at $3–5 million according to WME agent Tim Curtis. Then comes the companion buy: for every 30-second slot, advertisers are generally required to commit to spending an equivalent amount on other programs broadcast by the same network. For NBC's 2026 Super Bowl, that meant additional inventory across the Winter Olympics and NBA All-Star Game, adding another $7–10 million to the tab. Key findings: - A 30-second Super Bowl spot costs $8M but $16–23M fully loaded with production, talent, and mandatory companion buys - Stanford research shows competing brands both advertising cancels out the benefit, a prisoner's dilemma the NFL exploits for rising prices - The NFL is the last monoculture in American media: 84 of the top 100 most-watched US telecasts in 2025 - A single Super Bowl ad generates the same brand-search engagement as 1,056 typical primetime ads ### [Ozempic is Reshaping the Fast Food Industry](https://philippdubach.com/posts/ozempic-is-reshaping-the-fast-food-industry/) Published: 2026-01-16 Description: Cornell research: GLP-1 users cut grocery spending 5.3%, fast food 8%. With 16% household adoption and savory snacks down 10%, food stocks face headwinds. Summary: Something strange is happening in the food industry. New US dietary guidelines call for more protein and less sugar. Greggs, the UK bakery chain, just warned of "flatlining profits" in the food-to-go market. Food companies are racing to overhaul their brands, ditching artificial dyes and packing protein into products. Earnings calls across the sector blame "inflation" and "subdued consumer confidence." Nobody mentions the elephant in the room: GLP-1 medications. New research from Cornell finally puts numbers to what the food industry doesn't want to discuss. Using transaction data from 150,000 households linked to survey responses on medication adoption, Sylvia Hristakeva, Jūra Liaukonytė, and Leo Feler tracked exactly how Ozempic and Wegovy users change their spending. The results deserve attention from anyone holding food stocks. Key findings: - Cornell research on 150,000 households shows GLP-1 users cut grocery spending 5.3% within six months, with fast food down 8.0% and savory snacks hit hardest at 10.1% - 16.3% of U.S. households already have at least one GLP-1 user as of July 2024, with nearly half taking the medication for weight loss rather than diabetes - About 34% of users discontinue GLP-1 medications, and when they stop, candy and chocolate purchases rise 11.4% above pre-adoption levels, suggesting the drugs suppress appetite without teaching new habits - High-income households show steeper spending declines at 8.2%, and these are the most profitable fast food customers, creating a double loss of volume and margin ### [Does AI mean the demand on labor goes up?](https://philippdubach.com/posts/does-ai-mean-the-demand-on-labor-goes-up/) Published: 2026-01-15 Description: AI was supposed to free us. The Jevons paradox plays out in real time: efficiency expands workload, not leisure. 77% of workers say AI added to their work. Summary: Joe Weisenthal from Bloomberg, this week: All my shower thoughts now are about designing efficient workflows for synthesizing, collecting, labeling and annotating data. Same. Since I started building every app and tool I thought would make my life easier, my workflow more efficient, I haven't stopped. Apparently non-developers are now writing apps instead of buying them. This is the AI productivity paradox in miniature: the tools get better and we do more, not less. Key findings: - Workers in AI-exposed occupations now work roughly 3 extra hours per week, and leisure time has dropped by the same amount, according to NBER research - 77% of employees say AI tools have added to their workload, not reduced it, per Upwork's survey data - Only 21% of employees use time saved by AI for personal life, with the rest reinvesting it directly back into work - The Jevons paradox from 1865 predicted this: more efficient steam engines increased coal consumption, and more efficient AI tools are increasing work output expectations the same way ### [Agent-based Systems for Modeling Wealth Distribution](https://philippdubach.com/posts/agent-based-systems-for-modeling-wealth-distribution/) Published: 2025-08-30 Description: Agent-based modeling shows how random market transactions naturally produce extreme wealth concentration, and why even a small wealth tax changes everything. Summary: A question Gary Stevenson, the self-proclaimed best trader in the world, has been asking for some time is if a wealth tax can fix Britain's economy. [...] he believed the continued parlous state of the economy would halt any interest rate hikes. The reason? Because when ordinary people receive money, they spend it, stimulating the economy, while the wealthy tend to save it. But our economic model promotes the concentration of wealth among a select few at the expense of everybody else's living standards. Key findings: - The Affine Wealth Model matches 27 years of U.S. wealth data with less than 0.16% average error, showing extreme concentration emerges from random transactions alone. - Even a 1% wealth tax shifts the simulated distribution from Pareto extremes to a stable equilibrium where top agents hold at most 3-4x their starting wealth. - Roughly 10% of the U.S. population holds negative net worth, a feature the Affine Wealth Model captures by allowing agents to go below zero. ## Medicine (6 articles) ### [Why Lilly's Weight Loss Pill Isn't a Peptide](https://philippdubach.com/posts/why-lillys-weight-loss-pill-isnt-a-peptide/) Published: 2026-04-09 Description: Oral semaglutide destroys 99% of its active ingredient per dose. Lilly's Foundayo skips the problem entirely. Inside the $70B oral GLP-1 pill race. Summary: × Novo Nordisk spent decades and $1.8 billion learning how to get a peptide past the gut. Eli Lilly looked at the same problem and decided to skip it entirely. Key findings: - Oral semaglutide has reported bioavailability of roughly 0.4 to 1% per the EMA assessment report, meaning each pill discards most of its active ingredient before absorption - Eli Lilly's Foundayo (orforglipron) is a small molecule, not a peptide, sidestepping the oral peptide delivery problem that took the field over a century to address - Combined Novo Nordisk and Eli Lilly GLP-1 revenue hit roughly $70B in 2025; the statin precedent (197% user growth after atorvastatin went generic) suggests oral pills may expand rather than cannibalise total volume - GLP-1 penetration is under 5% of eligible US adults vs. 35%+ for statins, implying substantial room for market expansion ### [AI Can Now Design Drugs in Seconds; We Still Can't Tell You If They Work.](https://philippdubach.com/posts/ai-can-now-design-drugs-in-seconds-we-still-cant-tell-you-if-they-work./) Published: 2026-03-18 Description: IsoDDE doubles AlphaFold 3 on hard benchmarks and beats physics-based gold standards. But no AI drug has FDA approval. What $4B in pharma deals actually mean. Summary: No AI-discovered drug has ever received FDA approval. That sentence should sit uncomfortably next to every headline about Alphabet's drug discovery spinoff. On February 10, Isomorphic Labs, the Google DeepMind spinoff focused on computational drug design, released IsoDDE: its Drug Design Engine. This isn't a model or an AlphaFold upgrade. IsoDDE is a unified in silico drug discovery system that runs protein structure prediction, ligand binding, affinity estimation, and pocket identification in concert, generating in seconds what used to take days of physics-based simulation. On the hardest molecular prediction tasks, the "Runs N' Poses" benchmark designed to test generalization to unfamiliar proteins, IsoDDE hits a 50% success rate. AlphaFold 3 manages roughly 23%. On antibody-antigen modeling, IsoDDE beats AlphaFold 3 by 2.3× and the open-source Boltz-2 by 19.8×. On binding affinity prediction, it achieves a Pearson correlation of 0.85, beating the physics-based gold standard FEP+ at 0.78. × I would assume that these are large enough improvements that the computational bottleneck in drug design may no longer be the binding question. Key findings: - IsoDDE hits 50% on the hardest protein-ligand prediction benchmark versus 23% for AlphaFold 3 and beats the physics-based gold standard FEP+ on binding affinity with a 0.85 Pearson correlation - AI-discovered drugs show 80-90% Phase I success rates versus a 40-65% historical average, but Phase II efficacy rates remain roughly 40% for both AI and traditional drugs - Isomorphic's pharma deals total over $4 billion in headline value but only $82.5 million in upfront cash, a 50:1 ratio that reflects how much pharma is betting on contingent outcomes - No AI-discovered drug has received FDA approval as of February 2026, and Isomorphic targets its first clinical candidates for late 2026 ### [Novo Was Europe's Most Valuable Company](https://philippdubach.com/posts/novo-was-europes-most-valuable-company/) Published: 2026-02-23 Description: Novo Nordisk lost 75% since June 2024. CagriSema failed vs Zepbound, US pricing is resetting lower, and Lilly leads on every axis. Full breakdown with numbers. Summary: Novo Nordisk was Europe's most valuable company 20 months ago. Today its market capitalization falls behind ASML, LVMH, Hermès, L'Oréal, SAP, Prosus, Siemens, Inditex, Deutsche Telekom, and Santander. The stock has lost roughly 75% since its June 2024 peak of $142.44, falling from a $640 billion market cap to under $160 billion. Shares dropped another 16% this morning after Novo Nordisk announced that CagriSema, the follow-on obesity drug, did not meet the primary endpoint in the REDEFINE 4 open-label head-to-head trial against Eli Lilly's Zepbound. Key findings: - Novo Nordisk's market capitalisation declined approximately 75% from its June 2024 peak (roughly $640B to under $160B) and the company guided for its first revenue decline in modern history, with 2026 adjusted sales growth in a range of minus 5 to minus 13% - CagriSema did not meet the non-inferiority primary endpoint against Zepbound in REDEFINE 4, with reported weight-loss differences of 2.5 to 3.4 percentage points across estimands - Semaglutide's compound patent lapsed in Canada in January 2026 after a maintenance fee was missed, and generic filings have been reported in multiple jurisdictions - Novo trades at roughly 11x forward earnings, a notable compression from its 2024 peak multiple, broadly consistent with pricing in a degraded growth profile ### [Ozempic is Reshaping the Fast Food Industry](https://philippdubach.com/posts/ozempic-is-reshaping-the-fast-food-industry/) Published: 2026-01-16 Description: Cornell research: GLP-1 users cut grocery spending 5.3%, fast food 8%. With 16% household adoption and savory snacks down 10%, food stocks face headwinds. Summary: Something strange is happening in the food industry. New US dietary guidelines call for more protein and less sugar. Greggs, the UK bakery chain, just warned of "flatlining profits" in the food-to-go market. Food companies are racing to overhaul their brands, ditching artificial dyes and packing protein into products. Earnings calls across the sector blame "inflation" and "subdued consumer confidence." Nobody mentions the elephant in the room: GLP-1 medications. New research from Cornell finally puts numbers to what the food industry doesn't want to discuss. Using transaction data from 150,000 households linked to survey responses on medication adoption, Sylvia Hristakeva, Jūra Liaukonytė, and Leo Feler tracked exactly how Ozempic and Wegovy users change their spending. The results deserve attention from anyone holding food stocks. Key findings: - Cornell research on 150,000 households shows GLP-1 users cut grocery spending 5.3% within six months, with fast food down 8.0% and savory snacks hit hardest at 10.1% - 16.3% of U.S. households already have at least one GLP-1 user as of July 2024, with nearly half taking the medication for weight loss rather than diabetes - About 34% of users discontinue GLP-1 medications, and when they stop, candy and chocolate purchases rise 11.4% above pre-adoption levels, suggesting the drugs suppress appetite without teaching new habits - High-income households show steeper spending declines at 8.2%, and these are the most profitable fast food customers, creating a double loss of volume and margin ### [GLP-1 Receptor Agonists in ASUD Treatment](https://philippdubach.com/posts/glp-1-receptor-agonists-in-asud-treatment/) Published: 2025-11-21 Description: A phase 2 RCT shows low-dose semaglutide reduces alcohol craving and heavy drinking with effect sizes exceeding naltrexone. What GLP-1 means for AUD treatment. Summary: Alcohol and other substance use disorders (ASUDs) are complex, multifaceted, but treatable medical conditions with widespread medical, psychological, and societal consequences. However, treatment options remain limited, therefore the discovery and development of new treatments for ASUDs is critical. Glucagon-like peptide-1 receptor agonists (GLP-1RAs), currently approved for the treatment of type 2 diabetes mellitus, obesity, and obstructive sleep apnea, have recently emerged as potential new pharmacotherapies for ASUDs. Semaglutide is one of several GLP-1 receptor agonists being studied as a candidate pharmacotherapy for alcohol use disorder, an indication for which it is not approved in any jurisdiction. This research matters for people struggling with substance use disorders who have few effective treatment options, and the pharmacotherapy landscape remains thin. In February 2025, researchers at UNC published results from a randomized controlled trial of semaglutide in AUD. The phase 2 trial enrolled 48 non-treatment-seeking adults with AUD and administered low-dose semaglutide over a nine-week titration schedule below standard weight-loss dosing. Participants on semaglutide consumed less alcohol in controlled laboratory settings and reported fewer drinks per drinking day in their normal lives. They also reported less craving for alcohol. Heavy drinking episodes declined more sharply in the semaglutide group compared to placebo over the nine-week trial. The mechanism likely involves GLP-1 receptors in the brain's mesolimbic reward pathway, where the molecule modulates dopamine signaling to reduce the reinforcing effects of alcohol consumption. The reported effect sizes for some drinking outcomes were broadly in the range published for naltrexone, one of three FDA-approved AUD medications, but the studies are not head-to-head and the sample sizes are small. A large real-world database study of 83,825 patients with obesity also found semaglutide associated with a 50-56% lower risk of AUD incidence and recurrence compared to other anti-obesity medications, an observational signal that needs prospective replication. Larger trials are needed to confirm these early results. Phase 3 trials evaluating semaglutide for AUD are now underway, and pemvidutide, a GLP-1/glucagon dual receptor agonist, has received FDA Fast Track designation for alcohol use disorder. Key findings: - A phase 2 RCT of 48 non-treatment-seeking adults reported that low-dose semaglutide reduced alcohol craving and heavy drinking episodes versus placebo, an unapproved indication in any jurisdiction. - Reported effect sizes for some drinking outcomes were broadly in the range published for naltrexone, one of three FDA-approved medications for alcohol use disorder, though direct head-to-head trials have not been conducted. - New AUD therapies have been approved at a rate of roughly one every 25 years, making the GLP-1 receptor agonist class an unusually active early-stage research area. ### [Novo Nordisk's Post-Patent Strategy](https://philippdubach.com/posts/novo-nordisks-post-patent-strategy/) Published: 2025-06-29 Description: Novo Nordisk's lead replacement candidate amycretin reported Phase 1 data in The Lancet. How the company is positioning around the 2031 Ozempic patent cliff. Summary: Novo Nordisk currently sits atop a roughly $20 billion Ozempic/Wegovy franchise that faces patent expiration in 2031 — roughly seven years to establish a successor. We revisit them today, since per newly published Lancet data, Novo's lead replacement candidate, amycretin, has reported its Phase 1 results. The primary published metrics are weight-loss percentages versus placebo over a 36-week period; readers interested in head-to-head comparisons against currently approved therapies should consult the trial publications directly rather than rely on cross-trial inferences. Amycretin is a peptide combining the semaglutide GLP-1 mechanism with amylin receptor agonism, designed to engage two satiety pathways simultaneously rather than one. Elaine Chen at STAT covered the trial with a focus on the dose-response data; the full text is behind a paywall. Key findings: - Novo Nordisk's core Ozempic patent expires December 2031, framing a seven-year window in which the company must establish next-generation candidates such as amycretin. - Complex peptide manufacturing and patented injection devices create a capacity constraint that generic competitors cannot quickly replicate, regardless of when small-molecule chemistry goes off patent. - Novo's pipeline runs multiple shots on goal (amycretin, NNC-0519, NNC-0662, cagrilintide combinations) rather than betting the franchise on a single replacement molecule. ## Tech (11 articles) ### [Karpathy's Software 3.0 Playbook](https://philippdubach.com/posts/karpathys-software-3.0-playbook/) Published: 2026-05-01 Description: Twelve lessons from Andrej Karpathy's Sequoia interview: Software 3.0, vibe coding versus agentic engineering, jagged intelligence, and why December 2024 was the inflection most people missed. Summary: × Andrej Karpathy is one of the few people who has both built modern AI and explained it for the rest of us. He co-founded OpenAI, ran computer vision at Tesla (where he got Autopilot working), and his courses on neural networks are some of the most-watched lectures on the internet. He also has a habit of naming the era we're already in. "Vibe coding" was his. "Software 3.0" looks like the next one. Key findings: - Karpathy marks December 2024 as the inflection where agentic coding crossed from babysitting to trust, invisible to anyone whose mental model is still anchored to ChatGPT. - The GPT-3.5 to GPT-4 chess jump shows capability tracks whatever frontier labs feed into reinforcement learning, so verifiable domains automate first regardless of economic value. - In Software 3.0 the unit of programming shifts from a function to a paragraph, the context window is the program, and the LLM is the interpreter. - Vibe coding raises the floor for non-engineers while agentic engineering raises the ceiling for professionals well past the old 10x benchmark. ### [On-Device AI Models Will Be The New Reason to Upgrade Your Phone](https://philippdubach.com/posts/on-device-ai-models-will-be-the-new-reason-to-upgrade-your-phone/) Published: 2026-03-25 Description: Smartphones haven't had a compelling upgrade story in years. On-device AI models, distilled from frontier systems like Gemini, are about to change that. Parameters are the new megapixels. Summary: × The iPhone 17 runs a 3 billion parameter language model on-device at 30 tokens per second. Obviously, the average consumer has no idea what that sentence means, and Apple hasn't figured out how to make them care. Key findings: - The global smartphone replacement cycle has stretched to 3.5 years because cameras, screens, and processors stopped providing meaningful generational differences. - Apple's 3 billion parameter on-device Foundation Model runs at 30 tokens per second on an iPhone 15 Pro, but distilling from Google's full Gemini could push future on-device models far beyond that ceiling. - Gartner projects GenAI smartphone spending will hit $393 billion in 2026, a 32% jump from 2025, with nearly 100% of premium devices featuring GenAI capabilities by 2029. - Parameter counts risk becoming the next megapixel myth, a single number that marketing departments can inflate while actual on-device experience depends on quantization, distillation quality, and NPU architecture. ### [The Last Architecture Designed by Hand](https://philippdubach.com/posts/the-last-architecture-designed-by-hand/) Published: 2026-03-16 Description: The transformer's limits are now mathematical proofs, not empirical hunches. Hybrids are in production. AI is searching for its own replacement. Here's what comes after. Summary: I bet there is another new architecture to find that is gonna be as big of a gain as transformers were over LSTMs. Sam Altman, the CEO of the company most invested in the transformer is telling a room of students it isn't the final form. So what comes after the transformer? He's probably right that something will, and the evidence is no longer anecdotal. Several recent papers have proved that the transformer's worst properties are structural, not engineering problems to be fixed with better data or more compute, but mathematical lower bounds. Key findings: - Mathematical proofs now show that quadratic scaling, hallucination, and positional bias are structural properties of the transformer, not fixable with better training data or RLHF. - Over 60% of frontier models released in 2025 use Mixture of Experts, and production hybrids like Jamba and Qwen3-Next blend attention with state space models at 3x throughput. - AlphaEvolve found a 23% speedup inside Gemini's own architecture, cutting training time by 1% and recovering 0.7% of Google's total compute resources. - OpenAI's inference spending hit $2.3 billion in 2024, 15x what they spent training GPT-4.5, meaning the economic center of gravity has already shifted from training to inference. ### [MCP vs A2A in 2026: How the AI Protocol War Ends](https://philippdubach.com/posts/mcp-vs-a2a-in-2026-how-the-ai-protocol-war-ends/) Published: 2026-03-15 Description: MCP leads with 97M monthly SDK downloads and 10,000+ servers. A2A fills a different layer. Analysis of the agentic AI standards war with historical parallels. Summary: On March 26, 2025, Sam Altman posted the following three sentences people love MCP and we are excited to add support across our products. MCP is Anthropic's Model Context Protocol. OpenAI is Anthropic's most direct competitor. Altman was endorsing a rival's standard. That post may be the most significant event in enterprise AI infrastructure this year. When your main competitor adopts your protocol, the war is close to over. I've been watching this play out since Anthropic launched MCP in November 2024, and I want to work through what's happening: who controls what, what "interoperability" means in practice, and whether any of this follows patterns we've seen before. Key findings: - MCP reached 10,000+ servers and 97 million monthly SDK downloads before A2A launched, compounding a five-month head start into a structural ecosystem lead. - OpenAI adopting MCP in March 2025 mirrors the iMac's USB-only bet in 1998: one player so central to the ecosystem that their adoption made the standard inescapable. - The agentic AI market is $7-8 billion in 2025, with analyst projections ranging from $50 billion to $199 billion by 2034 at 40-50% annual growth. - 53% of MCP servers still rely on static credentials rather than OAuth, and a critical npm package vulnerability (CVE-2025-6514) exposed 437,000+ installations to shell injection. ### [93% of Developers Use AI Coding Tools. Productivity Hasn't Moved.](https://philippdubach.com/posts/93-of-developers-use-ai-coding-tools.-productivity-hasnt-moved./) Published: 2026-03-04 Description: METR found experienced developers 19% slower with AI, despite feeling 20% faster. At 92.6% adoption, organizational productivity gains remain roughly 10%. Summary: A study published in July 2025 gave AI coding tools their most credible test yet. Sixteen experienced open-source developers, 246 real tasks, randomized controlled design. The researchers expected to measure how much faster AI made them. What they found: developers using AI took 19% longer to complete tasks than those working without it. The developers themselves thought they were 20% faster. That 39-point gap between perception and reality is the most important number in METR's paper. It lands inside two years of adoption data pointing in the opposite direction. DX surveyed 121,000 developers across 450+ companies and found 92.6% use AI coding tools at least monthly. JetBrains' AI Pulse measured 93%. The DORA 2025 report put it at 90%. On the productivity side: six independent research efforts converge on roughly the same ceiling, 10% at the system level, if you're being generous. × Key findings: - A randomized controlled study found experienced developers using AI took 19% longer to complete tasks while believing they were 20% faster, a 39-point perception gap. - Writing code accounts for 25-35% of the software development lifecycle, so even a 100% coding speedup yields at most 15-25% system improvement under Amdahl's Law. - Teams with high AI adoption merged 98% more pull requests but saw review time increase 91%, with DORA delivery metrics unchanged across 10,000+ developers. - At 92.6% monthly adoption and 27% of production code AI-generated, six independent research efforts converge on roughly 10% organizational productivity gains. ### [Building a No-Tracking Newsletter from Markdown to Distribution](https://philippdubach.com/posts/building-a-no-tracking-newsletter-from-markdown-to-distribution/) Published: 2025-12-24 Description: Build a privacy-focused newsletter with Python, Cloudflare Workers KV, and Resend API. Zero tracking, zero cost, full control. Open source code included. Summary: × Friends have been asking how they can stay up to date with what I'm working on and keep track of the things I read, write, and share. RSS feeds don't seem to be en vogue anymore, apparently. So I built a mailing list. What else would you do over the Christmas break? Key findings: - The entire newsletter pipeline runs at zero cost using Cloudflare Workers KV for subscribers, R2 for hosting, and Resend's free tier for 3,000 emails per month. - The system sends no tracking pixels, no click tracking, and no external analytics, just rendered HTML from Markdown with table-based layout for email client compatibility. - A Python engine fetches OpenGraph metadata, generates LinkedIn-style preview cards, and optimizes images at 240px width for retina displays, all automated from a single .md file. ### [Visualizing Gradients with PyTorch](https://philippdubach.com/posts/visualizing-gradients-with-pytorch/) Published: 2025-08-23 Description: Build the right mental model for gradients with this PyTorch visualization tool. 2D surface plots with gradient vectors show the direction of steepest ascent. Summary: Gradients are one of the most important concepts in calculus and machine learning, but it's often poorly understood. Trying to understand them better myself, I wanted to build a visualization tool that helps me develop the correct mental picture of what the gradient of a function is. I came across GistNoesis/VisualizeGradient, so I went on from there to write my own iteration. This mental model generalizes beautifully to higher dimensions and is the foundation for understanding optimization algorithms like gradient descent. × The colored surface shows function values. Black arrows show gradient vectors in the input plane (x-y space), pointing toward the direction of steepest ascent. Key findings: - Gradient vectors live in the input plane and point toward steepest ascent, which is why moving opposite to them in gradient descent moves toward lower loss values. - Surface plots with overlaid gradient arrows show the relationship between function terrain and optimization direction more clearly than contour plots alone. - The same gradient intuition from 2D visualizations generalizes directly to neural networks with millions of parameters, where each component is a partial derivative with respect to one weight. ### [Counting Cards with Computer Vision](https://philippdubach.com/posts/counting-cards-with-computer-vision/) Published: 2025-07-06 Description: How I trained a YOLOv11 model to detect playing cards at 99.5% accuracy and built a real-time Monte Carlo blackjack odds calculator using computer vision. Summary: After installing Claude Code the agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster through natural language commands Related Against All Odds: The Mathematics of 'Provably Fair' Casino Games I was looking for a task to test its abilities. Fairly quickly we wrote less than 200 lines of python code predicting blackjack odds using Monte Carlo simulation. When I went on to test this little tool on Washington Post's online blackjack (I also didn't know that existed!) I quickly noticed how impractical it was to manually input all the card values on the table. What if the tool could also handle blackjack card detection automatically and calculate the odds from it? I have never done anything with computer vision so this seemed like a good challenge. × To get to any reasonable result we have to start with classification where we "teach" the model to categorize data by showing them lots of examples with correct labels. But where do the labels come from? I manually annotated 409 playing cards across 117 images using Roboflow Annotate (at first I only did half as much - why this wasn't a good idea we'll see in a minute). Once enough screenshots of cards were annotated we can train the model to recognize the cards and predict card values on tables it has never seen before. I was able to use a NVIDIA T4 GPU inside Google Colab which offers some GPU time for free when capacity is available. × During training, the algorithm learns patterns from this example data, adjusting its internal parameters millions of times until it gets really good at recognizing the differences between categories (in this case different cards). Once trained, the model can then make predictions on new, unseen data by applying the patterns it learned. With the annotated dataset ready, it was time to implement the actual computer vision model. I chose to run inference on Ultralytics' YOLOv11 pre-trained model, a leading object detection algorithm. I set up the environment in Google Colab following the "How to Train YOLO11 Object Detection on a Custom Dataset" notebook. After extracting the annotated dataset from Roboflow, I began training the model using the pre-trained YOLOv11s weights as a starting point. This approach, called transfer learning, allows the model to reuse patterns already learned from millions of general images and adapt them to this specific task. I initially set it up to run for 350 epochs, though the model's built-in early stopping mechanism kicked in after 242 epochs when no improvement was observed for 100 consecutive epochs. The best results were achieved at epoch 142, taking around 13 minutes to complete on the Tesla T4 GPU. The initial results were quite promising, with an overall mean Average Precision (mAP) of 80.5% at IoU threshold 0.5. Most individual card classes achieved good precision and recall scores, with only a few cards like the 6 and Queen showing slightly lower precision values. × However, looking at the confusion matrix and loss curves revealed some interesting patterns. While the model was learning effectively (as shown by the steadily decreasing loss), there were still some misclassifications between similar cards, particularly among the numbered cards. This highlighted exactly why I mentioned earlier that annotating only half the amount of data initially "wasn't a good idea" - more training examples would likely improve these edge cases and reduce confusion between similar-looking cards. My first attempt at solving the remaining accuracy issues was to add another layer to the workflow by sending the detected cards to Anthropic's Claude API for additional OCR processing. × This hybrid approach was very effective - the combination of YOLO's object detection to dynamically crop down the Black Jack table to individual cards with Claude's advanced vision capabilities yielded 99.9% accuracy on the predicted cards. However, this solution came with a significant drawback: the additional API layer consumed valuable time and the large model's processing overhead, making it impractical for real-time gameplay. Key findings: - YOLOv11 trained on 409 annotated playing cards achieved 99.5% mAP@50, up from 80.5% after doubling annotations and fixing bounding polygon errors. - Local YOLO inference runs at 45.5ms per image, roughly 40x faster than Roboflow's hosted API at 4 seconds, making real-time card detection practical. - Claude's Vision API achieved 99.9% accuracy on card recognition but was too slow for gameplay, while EasyOCR failed to detect roughly half the cards entirely. ### [Modeling Glycemic Response with XGBoost](https://philippdubach.com/posts/modeling-glycemic-response-with-xgboost/) Published: 2025-05-30 Description: A hands-on project predicting postprandial glucose curves with XGBoost, Gaussian curve fitting, and 27 engineered features from CGM data. Code on GitHub. Summary: Earlier this year I wrote how I built a CGM data reader after wearing a continuous glucose monitor myself. Since I was already logging my macronutrients and learning more about molecular biology in an MIT MOOC, I became curious: given a meal's macronutrients (carbs, protein, fat) and some basic individual characteristics (age, BMI), could a machine learning model predict the shape of my postprandial glucose curve? I came across Zeevi et al.'s paper on Personalized Nutrition by Prediction of Glycemic Responses, which used machine learning to predict individual glycemic responses from meal data. Exactly what I had in mind. Unfortunately, neither the data nor the code were publicly available. So I decided to build my own model. In the process I wrote this working paper. Key findings: - XGBoost with 27 engineered features predicted glucose spike amplitude at R-squared 0.46, nearly double the linear regression baseline of 0.24, but time-to-peak prediction was worse than guessing the average. - Meal composition tells you something about how high your blood sugar will rise but almost nothing about when it peaks or how long it stays elevated. - EPFL's Food and You study with 1,000+ participants achieved a correlation of 0.71 using the same XGBoost approach, confirming that data quantity matters more than model complexity. - Adding gut microbiome data increased explained variance in glucose peaks from 34% to 42%, showing how much meal composition alone leaves on the table. ### [I Built a CGM Data Reader](https://philippdubach.com/posts/i-built-a-cgm-data-reader/) Published: 2025-01-02 Description: Built a CGM data analysis tool with Python to visualize Freestyle Libre 3 glucose data alongside nutrition, workouts, and sleep for endurance cycling. Summary: If you're reading this, you might also be interested in: Modeling Glycemic Response with XGBoost Last year I put a Continuous Glucose Monitor (CGM) sensor, specifically the Abbott Freestyle Libre 3, on my left arm. Why? I wanted to optimize my nutrition for endurance cycling competitions. Where I live, the sensor is easy to get—without any medical prescription—and even easier to use. Unfortunately, Abbott's FreeStyle LibreLink app is less than optimal (3,250 other people with an average rating of 2.9/5.0 seem to agree). In their defense, the web app LibreView does offer some nice reports which can be generated as PDFs—not very dynamic, but still something! What I had in mind was more in the fashion of the Ultrahuman M1 dashboard. Unfortunately, I wasn't allowed to use my Libre sensor (EU firmware) with their app (yes, I spoke to customer service). Key findings: - Abbott's LibreLink app has a 2.9/5.0 rating from 3,250 reviews, so I built a Python dashboard that merges Libre 3 glucose data with nutrition, workout, and sleep data from five sources - The dashboard overlays workout traces, meal macros, and sleep phases onto continuous glucose readings, letting you spot correlations the stock app cannot surface - Key metrics for endurance athletes include time-in-range, coefficient of variation, and pattern analysis around meals and training loads ### [The Tech behind this Site](https://philippdubach.com/posts/the-tech-behind-this-site/) Published: 2024-01-15 Description: A Hugo blog tech stack with Cloudflare R2 image hosting, responsive WebP shortcodes, Workers AI social automation, and GitHub Pages CI/CD deployment. Summary: This site runs on Hugo, deployed to GitHub Pages with Cloudflare CDN. Images are hosted on R2 (static.philippdubach.com) with automatic resizing and WebP conversion. The core challenge was responsive images. Standard markdown ![alt](url) doesn't support multiple sizes. I built a Hugo shortcode that generates elements with breakpoint-specific sources—upload once at full quality, serve optimized versions (320px mobile to 1600px desktop) automatically. Updates May 2026 Cron Reliability — Scheduled rebuilds moved off GitHub Actions cron (drifts 30+ min, occasionally skips during platform incidents) onto a Cloudflare Worker that fires workflow_dispatch on a deterministic 17 */3 * * * UTC schedule. Builds still run on Actions; only the trigger moved. Cadence increased from 3× daily to every 3 hours, so max publish delay for future-dated posts dropped from ~7h to ~3h. Worker source: social-automation/build-trigger/. Key findings: - The site runs on Hugo with Cloudflare R2 image hosting, serving responsive WebP images from 320px to 1600px via a custom shortcode that generates picture elements from a single upload - Social media posts to Bluesky and Twitter are generated automatically by Cloudflare Workers running Llama 4 Scout 17B, with no manual intervention - A security headers Worker on Cloudflare injects HSTS, CSP, and COEP headers because GitHub Pages does not process _headers files natively