# Philipp D. Dubach - Full Content Index

> This is the extended version of llms.txt with full article summaries inline. For the compact version, see [llms.txt](http://philippdubach.com/llms.txt).

Independent researcher and strategy consultant specializing in quantitative finance, AI infrastructure economics, and macroeconomic analysis.

- Last Updated: 2026-03-17
- Total Articles: 75
- Site: http://philippdubach.com/


## AI (32 articles)

### [The Last Architecture Designed by Hand](http://philippdubach.com/posts/the-last-architecture-designed-by-hand/)
Published: 2026-03-16
Description: The transformer's limits are now mathematical proofs, not empirical hunches. Hybrids are in production. AI is searching for its own replacement. Here's what comes after.

Summary:  I bet there is another new architecture to find that is gonna be as big of a gain as transformers were over LSTMs.
Sam Altman, the CEO of the company most invested in the transformer is telling a room of students it isn't the final form. So what comes after the transformer? He's probably right that something will, and the evidence is no longer anecdotal. Several recent papers have proved that the transformer's worst properties are structural, not engineering problems to be fixed with better data or more compute, but mathematical lower bounds.
Key findings:
- Mathematical proofs now show that quadratic scaling, hallucination, and positional bias are structural properties of the transformer, not fixable with better training data or RLHF.
- Over 60% of frontier models released in 2025 use Mixture of Experts, and production hybrids like Jamba and Qwen3-Next blend attention with state space models at 3x throughput.
- AlphaEvolve found a 23% speedup inside Gemini's own architecture, cutting training time by 1% and recovering 0.7% of Google's total compute resources.
- OpenAI's inference spending hit $2.3 billion in 2024, 15x what they spent training GPT-4.5, meaning the economic center of gravity has already shifted from training to inference.


### [MCP vs A2A in 2026: How the AI Protocol War Ends](http://philippdubach.com/posts/mcp-vs-a2a-in-2026-how-the-ai-protocol-war-ends/)
Published: 2026-03-15
Description: MCP leads with 97M monthly SDK downloads and 10,000+ servers. A2A fills a different layer. Analysis of the agentic AI standards war with historical parallels.

Summary: On March 26, 2025, Sam Altman posted the following three sentences
people love MCP and we are excited to add support across our products.
MCP is Anthropic's Model Context Protocol. OpenAI is Anthropic's most direct competitor. Altman was endorsing a rival's standard. That post may be the most significant event in enterprise AI infrastructure this year. When your main competitor adopts your protocol, the war is close to over. I've been watching this play out since Anthropic launched MCP in November 2024, and I want to work through what's happening: who controls what, what 'interoperability' means in practice, and whether any of this follows patterns we've seen before.
Key findings:
- MCP reached 10,000+ servers and 97 million monthly SDK downloads before A2A launched, compounding a five-month head start into a structural ecosystem lead.
- OpenAI adopting MCP in March 2025 mirrors the iMac's USB-only bet in 1998: one player so central to the ecosystem that their adoption made the standard inescapable.
- The agentic AI market is $7-8 billion in 2025, with analyst projections ranging from $50 billion to $199 billion by 2034 at 40-50% annual growth.
- 53% of MCP servers still rely on static credentials rather than OAuth, and a critical npm package vulnerability (CVE-2025-6514) exposed 437,000+ installations to shell injection.


### [AI Models Are the New Rebar](http://philippdubach.com/posts/ai-models-are-the-new-rebar/)
Published: 2026-03-11
Description: Qwen 3.5-35B runs on a gaming PC and matches Claude Sonnet 4.5. When the commodity version is 95% as good and 97% cheaper, you have a pricing problem.

Summary: Qwen 3.5-35B-A3B, a model released by Alibaba in February 2026, runs on a single consumer GPU with 24 gigabytes of VRAM. A secondhand RTX 4090, available for around $2,000, generates 60 to 100 tokens per second with it. On select benchmarks per Alibaba's own evaluations, it matches or beats Claude Sonnet 4.5. The Qwen 3.5 Flash tier costs $0.10 per million input tokens through Alibaba's API. Claude Sonnet 4.5 costs $3.00.
That's a 97 percent discount. For comparable performance.
Key findings:
- Qwen 3.5-35B matches Claude Sonnet 4.5 on select benchmarks at $0.10 per million input tokens versus $3.00, a 97 percent cost gap for comparable performance.
- The performance gap between open-source and proprietary AI models shrank from 8 percent to 1.7 percent in a single year, per the Stanford HAI 2025 AI Index.
- OpenAI's adjusted gross margin fell from 40 to 33 percent in 2025 as inference costs quadrupled to $8.4 billion, while the company lost $13.5 billion in H1 2025.
- AI inference prices decline at a median rate of 50x per year for equivalent performance, according to Epoch AI, a pace that dwarfs Moore's Law.


### [AI Capex Arms Race: Who Blinks First?](http://philippdubach.com/posts/ai-capex-arms-race-who-blinks-first/)
Published: 2026-03-08
Description: Alphabet's free cash flow is on track to fall 90% in 2026. Amazon's is at $11B. $690B in AI capex is cannibalizing the cash that justified these valuations.

Summary: Alphabet's free cash flow is projected to fall roughly 90% in 2026. Not because the business is in trouble. Because the company has committed to spending $83–93 billion more on capital expenditure than it did last year.
That is what $660–690 billion in AI capex looks like up close. Amazon guided to $200 billion alone. Meta's long-term debt more than doubled to $58.7 billion to help finance its share. Goldman Sachs projects cumulative 2025–2027 spending across the Big 4 at $1.15 trillion, more than double the $477 billion spent over the prior three years combined. BofA credit strategists found this will consume 94% of operating cash flow minus dividends and buybacks.
Key findings:
- The Big 4 hyperscalers are on track to spend $610–665 billion in 2026, roughly 70% above 2025 levels, with Goldman Sachs projecting cumulative 2025–2027 spend at $1.15 trillion
- Alphabet's free cash flow may fall from $73 billion to roughly $8 billion in 2026 as capex doubles; Amazon's is already compressed to $11 billion TTM with $200B guidance ahead
- Direct AI revenue covers roughly 15% of AI-specific capex: Sequoia's David Cahn calculated the ecosystem needs $600 billion in annual revenue to justify current infrastructure spending, against the roughly $50–100 billion it actually generates
- Inference costs are falling 50–200x per year (Epoch AI), meaning existing GPU infrastructure may become stranded faster than depreciation schedules assume


### [Peter Thiel's Physics Department](http://philippdubach.com/posts/peter-thiels-physics-department/)
Published: 2026-03-02
Description: Peter Thiel says physics stalled in 1972. Then GPT-5.2 proved a new result in theoretical physics. The 75:1 AI compute gap between commerce and science.

Summary: On December 11, Jimmy Carr sat on the TRIGGERnometry podcast and delivered a riff that sounded like Peter Thiel's stagnation thesis filtered through a comedian's timing:
Minus the screens from any room, we're living in the 1970s. Nothing's happened in physics since '72. String theory has not got us anywhere. But if you take the compute power of AI and point it at physics, what happens? We could have a world of plenty. I hope that's the world we live in. But it could go another way.
Key findings:
- Total factor productivity growth fell from 1.7% annually (1947-1973) to 0.4% since 2004, a 76% decline that underpins Thiel's stagnation thesis
- Big Tech spends 75x more on AI than the entire US federal science budget: $250 billion versus $3.3 billion per year
- GPT-5.2 derived a new result in theoretical physics on February 13, 2026, overturning a decades-old assumption about gluon scattering amplitudes
- AI progressed from Olympiad geometry to IMO gold to a theoretical physics proof in 25 months, all on less than 1.3% of commercial AI compute


### [Every Bulge Bracket Bank Agrees on AI](http://philippdubach.com/posts/every-bulge-bracket-bank-agrees-on-ai/)
Published: 2026-03-01
Description: I read 12 AI research reports from Goldman Sachs, JPMorgan, UBS, and 6 other banks. Here's the consensus they're pushing, and what they're not saying.

Summary:  I spent the last week reading 12 bank AI research reports from nine of the world's largest financial institutions: Goldman Sachs, JPMorgan, Morgan Stanley (three separate reports), UBS, Barclays, Bank of America, HSBC, Citi, Deutsche Bank, and Santander. I wanted to understand how institutions that collectively manage trillions of dollars and employ thousands of analysts actually see this technology heading into 2026: where they agree, where they diverge, and what they're being less than forthcoming about.
Key findings:
- Not a single report from any of the nine institutions recommends reducing AI exposure. The absence of a bearish voice is itself the most important signal in the entire collection
- The macro productivity estimates span from +0.7% to +15% TFP over ten years, using the same underlying academic papers, cherry-picked to support nine different commercial narratives
- Only ~10% of US companies are productively using AI and 42% have abandoned GenAI projects. The gap between capex commitment and actual adoption is the most underweighted risk in the consensus
- AI capex already contributed 1.4–1.5 percentage points to US GDP growth in H1 2025, making infrastructure spending the dominant driver of US economic expansion in that period
- Morgan Stanley's historical data shows second-order beneficiaries outperform first-order enablers by 10–100x over long horizons, yet nearly every bank's current positioning favours first-order plays anyway


### [When AI Labs Become Defense Contractors](http://philippdubach.com/posts/when-ai-labs-become-defense-contractors/)
Published: 2026-03-01
Description: The Anthropic-Pentagon standoff isn't an ethics story. It's a replay of the 1993 Last Supper that consolidated 51 defense primes into 5, at Silicon Valley speed.

Summary: Lockheed started by building Amelia Earhart's favorite plane. Then came a government loan guarantee in 1971 (the L-1011 TriStar nearly killed the company), a Cold War, decades of consolidation, and now a business that earns 92.5% of its revenue from government contracts, with the F-35 alone accounting for 26% of its $71 billion in annual sales. The process took about 50 years. AI labs becoming defense contractors will happen faster.
On February 27, 2026, two things happened within hours of each other. President Trump ordered every federal agency to 'IMMEDIATELY CEASE all use of Anthropic's technology' after CEO Dario Amodei refused to strip safety constraints from Claude's Pentagon deployment, specifically prohibitions on mass domestic surveillance and fully autonomous weapons. Defense Secretary Pete Hegseth then labeled Anthropic a 'Supply-Chain Risk to National Security,' a designation previously reserved for foreign adversaries like Huawei, never before applied to an American company. That evening, Sam Altman announced that OpenAI had signed a deal to deploy its models on the Pentagon's classified network, posting that the Department of War 'displayed a deep respect for safety.' (Whether that reflects the Pentagon's actual position or Altman's political optimism, remains unclear for now.)
Key findings:
- The FY2026 Pentagon AI budget jumped to $13.4 billion from $1.8 billion, a 7x increase in a single budget cycle, now larger than Anthropic's entire annualized revenue of $14 billion.
- After the 1993 Last Supper, 51 prime defense contractors collapsed into 5 within four years. AI labs face the same consolidation logic, just faster: through classified network access and government-funded compute rather than M&A.
- IDIQ contracts account for 56% of DoD award dollars and run five years with extensions. Once embedded in classified systems with a security-cleared workforce (243-day average clearance processing), switching costs become close to prohibitive.
- Palantir's trajectory previews the endgame: $4.48 billion FY2025 revenue (up 56%), 53.7% from government, now worth nearly twice Boeing at $320 billion market cap.


### [The Impossible Backhand](http://philippdubach.com/posts/the-impossible-backhand/)
Published: 2026-02-17
Description: AI converges to the mean by design. Ninth-power scaling costs and a 53-point gap on Humanity's Last Exam show domain expertise is appreciating, not declining.

Summary: In the latest issue of The AI Lab Newsletter, I featured a ByteDance Seedance 2.0 clip: two men playing tennis at what looked like an ATP tournament. Photorealistic. I probably wouldn't be able to tell it wasn't real footage if I didn't know. A co-worker who played junior pro-am tennis watched the same clip and said: 'That backhand doesn't exist. Nobody plays it like that.' His domain expertise spotted an error that probably fooled everyone else.
Key findings:
- Computational cost scales with the ninth power of improvement in practice: halving an AI error rate requires more than 500x the computational resources
- On Humanity's Last Exam, top AI scores 37.5% versus human domain experts at roughly 90%, a 53-point gap, with AI calibration errors ranging from 34% to 89%
- The Harvard/BCG study of 758 consultants found AI users produced 40% higher quality work within AI's frontier but were 19 percentage points less accurate outside it when they blindly trusted the output
- Oxford researchers found complementary effects of AI on jobs are 1.7x larger than substitution effects, supporting augmentation over replacement


### [The SaaSpocalypse Paradox](http://philippdubach.com/posts/the-saaspocalypse-paradox/)
Published: 2026-02-13
Description: AI capex failure and AI replacing all software are mutually exclusive. Why the 2026 SaaSpocalypse is a $2 trillion pricing error, not an extinction event.

Summary:  The market is simultaneously pricing AI capex failure and AI destroying all software. Both cannot be true.
Anthropic released 11 open-source plugins for Claude Cowork on January 30. Apache-2.0 licensed, file-based, running in a macOS-only research preview. Within a week, the IGV software ETF had fallen 32% from its September peak to a 52-week low of $79.65, roughly $2 trillion in market cap had evaporated, and hedge funds had made $24 billion shorting the sector. The RSI hit 18, the most oversold reading since 1990. JP Morgan titled their note 'Software Collapse Broadens with Nowhere to Hide.' Jefferies coined the term SaaSpocalypse. It was the worst software stock crash since the dot-com bust.
Key findings:
- The market is simultaneously punishing hyperscalers for weak AI capex returns and destroying software stocks because AI adoption is so strong it replaces all software, both cannot be true
- The IGV software ETF fell 32% to an RSI of 18, the most oversold reading since 1990, while the sector delivered 17% aggregate earnings growth and every major name beat Q4 2025 estimates
- Recurring-revenue software with 90%+ gross margins now trades at 32.4x forward earnings versus 43.6x for cyclical semiconductors, an 11.2x inversion that has not persisted historically
- Goldman Sachs projects the application software market growing to $780 billion by 2030, with a16z arguing AI expands the addressable market from $350 billion in software to $6 trillion in white-collar services


### [Don't Go Monolithic; The Agent Stack Is Stratifying](http://philippdubach.com/posts/dont-go-monolithic-the-agent-stack-is-stratifying/)
Published: 2026-02-10
Description: The enterprise AI agent stack is stratifying into six layers with different winners at each. Models commoditize; context — your organizational world model — compounds. A framework for agentic AI architecture decisions.

Summary:  The defensible asset in enterprise AI is not the model. It's the organizational world model.
Every major compute era decomposes into specialized layers with different winners at each level. Cloud split into IaaS, PaaS, and SaaS. The modern data stack split into ingestion, warehousing, transformation, and BI. Each time, specialists beat the generalists because the layers have fundamentally different economics: different rates of change, different capital requirements, different sources of lock-in.
Key findings:
- 37% of enterprises now use five or more AI models in production, making single-provider lock-in the new version of single-cloud risk.
- The enterprise AI agent stack is stratifying into six layers with different winners at each, and context, not models, sits in the highest lock-in and hardest-to-rebuild zone.
- Most enterprise AI failures stem from shallow context: agents can retrieve the right documents but cannot reconstruct the reasoning processes humans follow to make decisions.
- Gartner predicts 40% of enterprise apps will feature AI agents by 2026 but warns over 40% of agentic AI projects will be canceled by end of 2027 due to unclear business value.


### [Where Mobile Money Goes Now](http://philippdubach.com/posts/where-mobile-money-goes-now/)
Published: 2026-02-07
Description: Apps overtook games in mobile IAP revenue for the first time in 2025, driven by $3.5B in GenAI growth. Analysis of Sensor Tower's State of Mobile 2026 report.

Summary: Sensor Tower's State of Mobile 2026 report confirms what had been building for years: the mobile app economy has permanently shifted. For the first decade of mobile, games made more money than everything else combined. Clash of Clans and Candy Crush built empires on freemium. King went public. Supercell sold for $10 billion. That changed in 2025.
Apps Overtake Games in Mobile Revenue Non-game applications now generate more in-app purchase revenue than games. Apps crossed $85.6 billion in 2025, up 21% year-over-year. Games managed $81.8 billion, barely moving from the year before.
Key findings:
- Non-game apps hit $85.6 billion in mobile IAP revenue in 2025, overtaking games ($81.8B) for the first time, with GenAI adding $3.5 billion as the single largest growth category.
- ChatGPT accounts for 40% of GenAI consumer app spending, making it the third-highest grossing app globally behind TikTok and Google One.
- GenAI users cluster demographically with Reddit and X, not Instagram or Pinterest, meaning AI apps are scaling revenue while still reaching a niche audience.
- YouTube is the only app ranked #1 across every US age group from 18-24 to 45+, something TikTok, Instagram, and Facebook have never achieved.


### [Claude Opus 4.6: Anthropic's New Flagship AI Model for Agentic Coding](http://philippdubach.com/posts/claude-opus-4.6-anthropics-new-flagship-ai-model-for-agentic-coding/)
Published: 2026-02-05
Description: Claude Opus 4.6 brings a 1M token context window, 68.8% ARC-AGI-2, and Agent Teams to Claude Code. Full benchmark comparison vs GPT-5.2 and Gemini 3 Pro with pricing analysis.

Summary: Anthropic just released Claude Opus 4.6, the latest frontier AI model in the Claude family. It's a big upgrade over Opus 4.5 and probably the most agentic-focused LLM release from any lab this year.
Key upgrades: better agentic AI coding capabilities (plans more carefully, sustains longer tasks, catches its own mistakes), a 1M token context window (a first for Opus-class models), and 128K output tokens. Pricing holds at $5/$25 per million tokens. 
Key findings:
- Opus 4.6 scores 68.8% on ARC-AGI-2 versus 37.6% for Opus 4.5 and 54.2% for GPT-5.2, the largest single-generation leap on the benchmark that resists memorization
- The 1M token context window scores 76% on MRCR v2 needle-in-a-haystack retrieval versus 18.5% for Sonnet 4.5, a different capability class rather than an incremental improvement
- Opus 4.6 beats GPT-5.2 by 144 Elo points on GDPval-AA, the benchmark measuring real-world knowledge work across 44 professional occupations
- Pricing holds at $5/$25 per million tokens versus GPT-5.2 at $2/$10, so the value proposition depends on whether agentic improvements translate to fewer retries and faster task completion


### [Buying the Haystack Might Not Work This Year](http://philippdubach.com/posts/buying-the-haystack-might-not-work-this-year/)
Published: 2026-01-31
Description: a16z sees AI fundamentals thriving with 80% GPU utilization. AQR sees the CAPE at the 96th percentile. Both have data. Both may be right.

Summary: I've been reading the January 2026 state of markets reports from Andreessen Horowitz and AQR, and their conclusions on the AI bubble question in 2026 are almost impossible to reconcile.
The a16z view is straightforward: AI fundamentals are real, and current prices reflect that reality. Their evidence is compelling. The top 50 private AI companies now generate $40.6 billion in annual revenue. Companies like ElevenLabs and Cursor are hitting $100 million ARR faster than Slack or Twilio ever did. GPUs are running at 80% utilization, compared to the 7% utilization rate for fiber optic cables during the dotcom bubble. This isn't speculation, they argue. It's demand exceeding supply. AQR looks at the same market and sees something else entirely. Their capital market assumptions put the U.S. CAPE ratio at the 96th percentile since 1980. Expected real returns for U.S. large cap equities over the next 5-10 years? 3.9%. For a global 60/40 portfolio, just 3.4%, well below the long-term average of roughly 5% since 1900. Risk premia, in their framework, are compressed across nearly every asset class. The narrative doesn't enter their models. a16z points to earnings growth. The market rally hasn't been driven by multiple expansion, they note, but by actual EPS growth. Tech P/E multiples sit around 30-35x, elevated but nowhere near the 70-80x of 2000. Tech margins have 'lapped the field' at 25%+ compared to 5-8% for the rest of the S&P 500. The fundamentals, they insist, are doing the work. AQR's response would be that fundamentals always look good near peaks. Their research shows a 50% probability that realized equity returns will miss estimates by more than 3 percentage points annually over the next decade. Compressed premia don't announce themselves with blaring headlines. They just quietly erode returns until investors notice they've been running in place.
Key findings:
- GPU utilization runs at 80% versus 7% for fiber optic cables during the dotcom era, but the U.S. CAPE ratio sits at the 96th percentile since 1980, historically associated with low future returns
- Cumulative hyperscaler capex is projected to hit $4.8 trillion by 2030, requiring roughly $1 trillion in annual AI revenue to clear a 10% hurdle rate
- Non-U.S. developed markets offer expected returns around 5% versus 3.9% for U.S. large caps, a valuation gap that holds even if the AI story is true
- AQR estimates a 50% probability that realized equity returns will miss estimates by more than 3 percentage points annually over the next decade


### [Bandits and Agents: Netflix and Spotify Recommender Stacks in 2026](http://philippdubach.com/posts/bandits-and-agents-netflix-and-spotify-recommender-stacks-in-2026/)
Published: 2026-01-30
Description: How hybrid recommender systems balance multi-armed bandits against LLM inference cost economics in 2026. A deep dive into Netflix recommendation algorithm architecture and Spotify's AI DJ recommender system.

Summary: Hyperscalers spent over $350 billion on AI infrastructure in 2025 alone, with projections exceeding $500 billion in 2026. The trillion-dollar question is not whether machines can reason, but whether anyone can afford to let them. Hybrid recommender systems sit at the center of this tension. Large Language Models promised to transform how Netflix suggests your next show or how Spotify curates your morning playlist. Instead, the industry has split into two parallel universes, divided not by capability but by cost.
Key findings:
- A single LLM recommendation consumes thousands of tokens while a collaborative filtering dot product costs a fraction of a cent, making full LLM inference economically impossible at Netflix or Spotify scale
- Netflix measures recommendation value by incrementality, the causal lift of showing a title versus not showing it, because a greedy algorithm that always surfaces high-probability titles collapses the discovery space
- Spotify's AI DJ uses an agentic router that decides per-query whether to invoke the expensive LLM or fall back to fast keyword matching, an inference cost optimizer disguised as a product feature
- Hyperscalers spent over $350 billion on AI infrastructure in 2025, but the industry consensus is a hybrid funnel: cheap models for millions of candidates, expensive reasoning only for the final dozen items a user sees


### [The Most Expensive Assumption in AI](http://philippdubach.com/posts/the-most-expensive-assumption-in-ai/)
Published: 2026-01-26
Description: Sara Hooker's research challenges the trillion-dollar scaling thesis. Compact models now outperform massive ones as diminishing returns hit AI.

Summary: Sara Hooker's paper arrived with impeccable timing. On the slow death of scaling dropped just as hyperscalers are committing another $500 billion to GPU infrastructure, bringing total industry deployment into the scaling thesis somewhere north of a trillion dollars. I've been tracking these capital flows for my own portfolio. Either Hooker is early to a generational insight or she's about to be very publicly wrong. The core argument is very simple: bigger is not always better. Llama-3 8B outperforms Falcon 180B. Aya 23 8B beats BLOOM 176B despite having only 4.5% of the parameters. These are not isolated flukes. Hooker plots submissions to the Open LLM Leaderboard over two years and finds a systematic trend where compact models consistently outperform their bloated predecessors. The bitter lesson, as Rich Sutton framed it, was that brute force compute always wins. Hooker's counter is that maybe we've been held hostage to 'a painfully simple formula' that's now breaking down. Scaling laws, she notes, only reliably predict pre-training test loss. When you look at actual downstream performance, the results are 'murky or inconsistent.' The term 'emergent properties' gets thrown around to describe capabilities that appear suddenly at scale, but Hooker points out this is really just a fancy way of admitting we have no idea what's coming. If your scaling law can't predict emergence, it's not much of a law.
Key findings:
- Compact models now outperform massive predecessors: Llama-3 8B beats Falcon 180B, Aya 23 8B beats BLOOM 176B at 4.5% of parameters
- Scaling laws only reliably predict pre-training loss, not downstream performance, because emergent properties mask our inability to predict what's next
- Hedge fund short interest in AI-adjacent utilities sits at the 99th percentile vs. the past 5 years
- Frontier labs are incorporating classical symbolic tools on CPUs, meaning the age of brute-force scaling may be ending


### [Enterprise AI Strategy is Backwards](http://philippdubach.com/posts/enterprise-ai-strategy-is-backwards/)
Published: 2026-01-22
Description: 85% of AI projects fail. Only 26% translate pilots to production. The winners automate the coordination layer where employees spend 57% of their workday.

Summary: That’s the claim made by LinkedIn co-founder Reid Hoffman. It’s a bold assertion, so I set out to investigate whether the data supports it. The result is a comprehensive report, backed by more than 30 sources. You can download the full report and the accompanying presentation for free.
Key findings:
- 85% of enterprise AI projects fail; only 26% of companies translate pilots to production
- Employees spend 57% of their workday on coordination, the layer AI should target first
- Language models bridge messy communication to structured data: transcripts to CRM fields at 99% accuracy, 30% higher win rates
- AI gains compound when knowledge capture becomes shareable across the organization


### [Does AI mean the demand on labor goes up?](http://philippdubach.com/posts/does-ai-mean-the-demand-on-labor-goes-up/)
Published: 2026-01-15
Description: AI was supposed to free us. The Jevons paradox plays out in real time: efficiency expands workload, not leisure. 77% of workers say AI added to their work.

Summary: Joe Weisenthal from Bloomberg, this week:
All my shower thoughts now are about designing efficient workflows for synthesizing, collecting, labeling and annotating data.
Same. Since I started building every app and tool I thought would make my life easier, my workflow more efficient, I haven't stopped. Apparently non-developers are now writing apps instead of buying them. This is the AI productivity paradox in miniature: the tools get better and we do more, not less.
Key findings:
- Workers in AI-exposed occupations now work roughly 3 extra hours per week, and leisure time has dropped by the same amount, according to NBER research
- 77% of employees say AI tools have added to their workload, not reduced it, per Upwork's survey data
- Only 21% of employees use time saved by AI for personal life, with the rest reinvesting it directly back into work
- The Jevons paradox from 1865 predicted this: more efficient steam engines increased coal consumption, and more efficient AI tools are increasing work output expectations the same way


### [Social Media Success Prediction: BERT Models for Post Titles](http://philippdubach.com/posts/social-media-success-prediction-bert-models-for-post-titles/)
Published: 2026-01-10
Description: Training RoBERTa to predict Hacker News success revealed temporal leakage inflating metrics. How temporal splits, calibration, and regularization fix it.

Summary: Last week I published a Hacker News title sentiment analysis based on the Attention Dynamics in Online Communities paper I have been working on. The discussion on Hacker News raised the obvious question: can you actually predict what will do well here? The honest answer is: partially. Timing matters. News cycles matter. Who submits matters. Weekend versus Monday morning matters. Most of these factors aren't in the title. But titles aren't nothing either. 'Show HN' signals something. So does phrasing, length, and topic selection. The question becomes: how much signal can you extract from 80 characters?
Key findings:
- A fine-tuned RoBERTa model achieves 0.685 AUC predicting Hacker News success from titles alone, with the top 10% of predictions hitting at 1.9x the random baseline
- Switching from random to temporal train/test splits dropped the ensemble AUC from 0.714 to 0.693 and collapsed SBERT's contribution from 0.35 weight to 0.10, exposing temporal leakage
- Increasing dropout to 0.2, weight decay to 0.05, and freezing 6 lower transformer layers cut the train-test overfitting gap by 61% while barely affecting test performance
- Isotonic calibration reduced Expected Calibration Error from 0.089 to 0.043, meaning predicted probabilities now match observed hit rates


### [Beyond Vector Search: Why LLMs Need Episodic Memory](http://philippdubach.com/posts/beyond-vector-search-why-llms-need-episodic-memory/)
Published: 2026-01-09
Description: Context windows aren't memory. Explore EM-LLM's episodic architecture, knowledge graph tools like Mem0 and Letta, and why vectors fail for sequential data.

Summary: You've seen this message before. Copilot pausing; In long sessions, it happens often enough that I started wondering what's actually going on in there. Hence this post. The short answer: context windows grew larger. Claude handles 200K tokens, Gemini claims a million. But bigger windows aren't memory. They're a larger napkin you throw away when dinner's over.
Key findings:
- Context windows grew to 200K tokens (Claude) and 1M (Gemini), but bigger windows are not memory because they lack persistence, temporal awareness, and the ability to update facts across sessions
- EM-LLM segments conversation into episodes using surprise detection, and its event boundaries correlate with where humans perceive breaks in experience
- HeadKV found you can discard 98.5% of a transformer's key-value cache by keeping only the attention heads that matter for memory, with almost no quality loss
- Mem0 reports 80-90% token cost reduction with a 26% quality improvement by replacing raw chat history with structured memory, though the claim is unvalidated


### [65% of Hacker News Posts Have Negative Sentiment, and They Outperform](http://philippdubach.com/posts/65-of-hacker-news-posts-have-negative-sentiment-and-they-outperform/)
Published: 2026-01-07
Description: Sentiment analysis of 32,000 Hacker News posts shows 65% skew negative and earn 27% more points. Six transformer and LLM models tested, full data included.

Summary: Negativity Bias and Engagement on Hacker News This Hacker News sentiment analysis began with a simple observation: posts with negative sentiment average 35.6 points on Hacker News. The overall average is 28 points. That's a 27% performance premium for negativity.
This finding comes from an empirical study I've been running on HN attention dynamics, covering decay curves, preferential attachment, survival probability, and early-engagement prediction. The preprint is available on SSRN. I already had a gut feeling. Across 32,000 posts and 340,000 comments, nearly 65% register as negative. This might be a feature of my classifier being miscalibrated toward negativity; yet the pattern holds across six different models.
Key findings:
- Across 32,000 Hacker News posts, nearly 65% register as negative sentiment, a pattern that holds across six different models including DistilBERT, RoBERTa, and Llama 3.1 8B
- Negative posts average 35.6 points versus 28 overall, a 27% engagement premium for negativity, though most HN negativity is substantive critique rather than toxicity
- Score distribution follows a power law with high Gini coefficients, meaning a small fraction of posts capture most attention while the majority get almost none


### [RSS Swipr: Find Blogs Like You Find Your Dates](http://philippdubach.com/posts/rss-swipr-find-blogs-like-you-find-your-dates/)
Published: 2026-01-05
Description: Build an open-source ML RSS reader with swipe interface. Uses MPNet embeddings and Thompson sampling for personalized feeds that escape the filter bubble.

Summary:  Algorithmic timelines are everywhere now. But I still prefer the control of RSS. Readers are good at aggregating content but bad at filtering it. What I wanted was something borrowed from dating apps: instead of an infinite list, give me cards. Swipe right to like, left to dislike. Then train a model to surface what I actually want to read. So I built RSS Swipr.
Key findings:
- MPNet embeddings combined with a Hybrid Random Forest achieve 75.4% ROC-AUC on article preference prediction, up from 66% with hand-engineered features alone.
- Thompson sampling allocates 80% of shown articles to predicted preferences and 20% to random exploration, preventing filter bubbles while keeping recommendations useful.
- The entire system runs locally at zero cost: Python/Flask backend, vanilla JS frontend, SQLite storage, and free-tier Google Colab for GPU training.


### [Apple's AI Bet: Playing the Long Game or Missing the Moment?](http://philippdubach.com/posts/apples-ai-bet-playing-the-long-game-or-missing-the-moment/)
Published: 2025-12-30
Description: Apple's $157B cash pile and Gemini-powered Siri shift show a restrained AI strategy. Analysis of whether Apple wins as AI models become commodities.

Summary: The Information published a piece today arguing that Apple's restrained AI approach may finally pay off in 2026. The thesis: while OpenAI, Google, and Meta pour hundreds of billions into data centers and model training, Apple has kept its powder dry, sitting on $157 billion in cash and marketable securities as of Q4 2025. If the AI spending bubble deflates, Apple's position looks rather clever. This piqued my interest, from a strategy point of view: Apple hasn't been absent from AI. They've been making a specific bet that large language models will commoditize, and that value will flow to distribution and customer relationships rather than to whoever has the best model. The revamped Siri expected in spring 2026 will reportedly be powered by Google's Gemini through a deal worth $1 billion annually. The custom Gemini model will run on Apple's Private Cloud Compute servers. This is consistent with Apple's history. They didn't build their own search engine. They took Google's money to be the default on Safari. John Giannandrea's retirement earlier this month, with Siri now under Mike Rockwell, signals internal recognition that something had to change.
Key findings:
- Apple sits on $157B in cash while Microsoft, Google, Amazon, and Meta spend roughly $400B collectively on AI infrastructure in 2025, betting that LLMs will commoditize.
- The revamped Siri will be powered by Google's Gemini through a $1B annual deal running on Apple's Private Cloud Compute servers, treating models as interchangeable utilities.
- AI API pricing has dropped 97% since GPT-3's launch, and Apple can push features to 2.3B active devices via software updates, favoring distribution over R&D spending.


### [Is AI Really Eating the World? AGI, Networks, Value [2/2]](http://philippdubach.com/posts/is-ai-really-eating-the-world-agi-networks-value-2/2/)
Published: 2025-11-24
Description: AGI predictions miss the point. Multiple competing models means price war. Value flows to applications, customer relationships, and vertical integrators.

Summary: Start by reading Is AI Really Eating the World? What we've Learned [1/2]
All current recommendation systems work by capturing and analyzing user behavior at scale. Netflix needs millions of users watching millions of hours to train its recommendation algorithm. Amazon needs billions of purchases. The network effect comes from data scale. What if LLMs can bypass this? What if an LLM can provide useful recommendations by reasoning about conceptual relationships rather than requiring massive behavioral datasets? If I ask for 'books like Pirsig's Zen and the Art of Motorcycle Maintenance but more focused on Eastern philosophy,' a sufficiently capable LLM might answer well without needing to observe 100 million readers. It understands (or appears to understand) the conceptual space. I'm uncertain whether LLMs can do this reliably by the end of 2025. The fundamental question is whether they reason or pattern-match at a very sophisticated level. Recent research suggests LLMs may rely more on statistical correlations than true reasoning. If it's mostly pattern-matching, they still need the massive datasets and we're back to conventional network effects. If they can actually reason over conceptual spaces, that's different. That would unbundle data network effects from recommendation quality. Recommendation quality would depend on model capability, not data scale. And if model capability is commoditizing, then the value in recommendations flows to whoever owns customer relationships and distribution, not to whoever has the most data or the best model. I lean toward thinking LLMs are sophisticated pattern-matchers rather than reasoners, which means traditional network effects still apply. But this is one area where I'm genuinely waiting to see more evidence.
Key findings:
- Even if AGI arrives by 2028, multiple competing providers will likely reach it simultaneously, meaning prices collapse toward marginal cost and value flows to AI users, not providers.
- GPT-4 launched with a substantial capability lead in March 2023, but within six months Claude 2 was comparable, suggesting frontier leads are measured in months, not years.
- LLMs likely remain sophisticated pattern-matchers rather than true reasoners, which means traditional data network effects still apply and recommendation incumbents keep their moats.
- The hyperscaler playbook (infrastructure + model + distribution + bundling) is more plausible for value capture than the 'best model wins' thesis, mirroring how AWS, not databases, captured cloud value.


### [Is AI Really Eating the World? [1/2]](http://philippdubach.com/posts/is-ai-really-eating-the-world-1/2/)
Published: 2025-11-23
Description: Hyperscalers spend $400B on AI, API prices drop 97%, and DeepSeek builds frontier models for $500M. Value is flowing to applications, not model providers.

Summary: In August 2011, Marc Andreessen wrote 'Why Software Is Eating the World', an essay about how software was transforming industries, disrupting traditional businesses, and revolutionizing the global economy. Recently, Benedict Evans, a former a16z partner, gave a presentation on the generative AI platform shift three years after ChatGPT's launch. His argument in short:
we know this matters, but we don't know how.
In this article I will try to explain why I find his framing fascinating but incomplete, and why the evidence points toward AI model commoditization rather than durable competitive advantages at the model layer. Evans structures technology history in cycles. Every 10-15 years, the industry reorganizes around a new platform: mainframes (1960s-70s), PCs (1980s), web (1990s), smartphones (2000s-2010s). Each shift pulls all innovation, investment, and company creation into its orbit. Generative AI appears to be the next platform shift, or it could break the cycle entirely. The range of outcomes spans from 'just more software' to a single unified intelligence that handles everything. The pattern recognition is smart, but I think the current evidence points more clearly toward commoditization than Evans suggests, with value flowing up the AI value chain to applications rather than to model providers.
Key findings:
- Hyperscalers are spending $400B on AI infrastructure in 2025, more than global telecom capex, while API pricing has dropped 97% since GPT-3, pointing to rapid commoditization.
- 92% of developers now use AI coding tools, but 40% of CIOs do not expect production AI agent deployment until 2026 or later, showing adoption is deep in pockets but shallow overall.
- Consulting firms like Accenture are booking $3B+ in GenAI revenue, but the money comes from integration and process redesign, not from the models themselves.
- DeepSeek proved a frontier model can be built for $500M, collapsing the assumption that only the richest labs can compete at the capability frontier.


### [Weather Forecasts Have Improved a Lot](http://philippdubach.com/posts/weather-forecasts-have-improved-a-lot/)
Published: 2025-11-22
Description: Four-day forecasts now match one-day accuracy from 30 years ago. How AI models like WeatherNext 2 use CRPS training to preserve extreme weather signals.

Summary: Reading the press release for Google DeepMind's WeatherNext 2, I wondered: have weather forecasts actually improved over the past years?
Turns out they have, dramatically. A four-day forecast today matches the accuracy of a one-day forecast from 30 years ago. Hurricane track errors that once exceeded 400 nautical miles for 72-hour forecasts now sit below 80 miles. The European Centre for Medium-Range Weather Forecasts reports three-day forecasts now reach 97% accuracy, with seven-day forecasts approaching that threshold.
Key findings:
- A four-day weather forecast today matches the accuracy of a one-day forecast from 30 years ago, with three-day forecasts now reaching 97% accuracy.
- WeatherNext 2 generates forecasts in under a minute on a single TPU, compared to hours on a supercomputer for physics-based models, an up to 10,000x speed improvement.
- CRPS training preserves sharp spatial features and extreme values that L2 losses blur, solving a key weakness of earlier neural weather models for cyclone and heat wave prediction.
- Hurricane track errors dropped from 400+ nautical miles to below 80 miles for 72-hour forecasts, with Google's model performing well against actual hurricane paths this season.


### [The Bicycle Needs Riding to be Understood](http://philippdubach.com/posts/the-bicycle-needs-riding-to-be-understood/)
Published: 2025-11-14
Description: Build an LLM agent in 50 lines of Python. Why context engineering and emergent behaviors are best understood through hands-on experimentation, not theory.

Summary:  Some concepts are easy to grasp in the abstract. Boiling water: apply heat and wait. Others you really need to try. You only think you understand how a bicycle works, until you learn to ride one.
You should write an LLM agent—not because they're revolutionary, but because the bicycle needs riding to be understood. Having built agents myself, Ptacek's central insight resonates: the behavior surprises in specific ways, particularly around how models scale effort with complexity before inexplicably retreating.
Key findings:
- A functioning LLM agent can be built in roughly 50 lines of Python, making the open problems in agent design accessible to individual experimentation in minutes.
- In Ptacek's demo, an LLM with ping access autonomously chose multiple Google endpoints without being told to, illustrating both the promise and unpredictability of agent behavior.
- Context engineering is not mystical but straightforward programming: managing token budgets, orchestrating sub-agents, and balancing explicit control loops against emergent behavior.


### [AI Models as Standalone P&Ls](http://philippdubach.com/posts/ai-models-as-standalone-pls/)
Published: 2025-11-09
Description: OpenAI lost $11.5B in one quarter. But Anthropic CEO Dario Amodei argues each AI model is independently profitable. Here's why the math is complicated.

Summary:  Microsoft reported earnings for the quarter ended Sept. [...] buried in its financial filings were a couple of passages suggesting that OpenAI suffered a net loss of $11.5 billion or more during the quarter.
For every dollar of revenue, they're allegedly spending roughly $5 to deliver the product. These OpenAI losses initially sound like a joke about 'making it up on volume,' but they point to a more fundamental problem facing OpenAI and its competitors. AI companies are locked into continuously releasing more powerful (and expensive) models. If they stop, open-source alternatives will catch up and offer equivalent capabilities at substantially lower costs. This creates an uncomfortable dynamic. If your current model requires spending more than you earn just to fund the next generation, the path to profitability becomes unclear—perhaps impossible.
Key findings:
- OpenAI lost $11.5B in one quarter, spending roughly $5 for every $1 of revenue because each new model generation costs about 10x more to train.
- Dario Amodei argues each AI model is independently profitable: a $100M training run generating $200M in revenue looks like a 2x return when isolated from the next investment cycle.
- The per-model profitability thesis breaks down if open-source alternatives close the capability gap within months, compressing the revenue window before training costs are recouped.


### [Working with Models](http://philippdubach.com/posts/working-with-models/)
Published: 2025-11-08
Description: Diffusion models corrupt data into noise, then reverse the process. Learn the math with Stefano Ermon's Stanford CS236 course, free on YouTube.

Summary: There was this 'I work with Models' joke which I first heard years ago from an analyst working on a valuation model (see my previous post). I guess it has become more relevant than ever:
This monograph presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse formulations arise from shared mathematical ideas. Diffusion modeling starts by defining a forward process that gradually corrupts data into noise, linking the data distribution to a simple prior through a continuum of intermediate distributions.
Key findings:
- Diffusion models work by gradually corrupting data into noise through a forward process, then learning to reverse that process to generate new samples.
- Stanford CS236 by Stefano Ermon covers the full mathematical foundations of deep generative models, from VAEs and GANs to score-based diffusion, and is freely available on YouTube.
- The shared mathematical ideas underlying diverse diffusion formulations trace back to linking data distributions to simple priors through a continuum of intermediate distributions.


### [Sentiment Trading Revisited](http://philippdubach.com/posts/sentiment-trading-revisited/)
Published: 2025-07-07
Description: A Rutgers study finds news sentiment embeddings from OpenAI models cut stock price prediction errors by 40%. Time-independent models perform just as well.

Summary: Interesting new paper on news sentiment embeddings for stock price forecasting that builds on many of the ideas I explored in this project. The research, by Ayaan Qayyum, an Undergraduate Research Scholar at Rutgers, shows that the core concept of using advanced language models for sentiment trading is not only viable but highly effective. The study takes a similar but more advanced approach. Instead of using a model like GPT-3.5 to generate a simple sentiment score, it uses OpenAI's embedding models to convert news headlines into rich, high-dimensional vectors. By training a battery of neural networks including
Key findings:
- OpenAI news headline embeddings fed into neural networks cut stock price prediction errors by up to 40% compared to models using only price and economic data.
- Time-independent models, where training data was shuffled rather than kept chronological, performed as well as time-dependent ones, suggesting markets react to news substance consistently regardless of timing.
- Rich embedding vectors capturing full semantic nuance outperformed traditional sentiment scoring that reduces each headline to a single positive, negative, or neutral label.


### [Not All AI Skeptics Think Alike](http://philippdubach.com/posts/not-all-ai-skeptics-think-alike/)
Published: 2025-06-12
Description: Apple's Illusion of Thinking paper found AI reasoning models collapse at high complexity. But critics argue the methodology, not the models, may be flawed.

Summary: Apple's recent paper 'The Illusion of Thinking' has been widely understood to demonstrate that reasoning models don't 'actually' reason. Using controllable puzzle environments instead of contaminated math benchmarks, they discovered something fascinating: there are three distinct performance regimes when it comes to AI reasoning complexity. For simple problems, standard models actually outperform reasoning models while being more token-efficient. At medium complexity, reasoning models show their advantage. But at high complexity? Both collapse completely. Here's the kicker: reasoning models exhibit counterintuitive scaling behavior—their thinking effort increases with problem complexity up to a point, then declines despite having adequate token budget. It's like watching a student give up mid-exam when the questions get too hard, even though they have plenty of time left.
Key findings:
- Apple's paper found three performance regimes: standard models beat reasoning models on simple tasks, reasoning models win at medium complexity, and all models collapse at high complexity.
- Reasoning models counterintuitively reduce their thinking effort as problems approach the collapse threshold, even with token budget remaining.
- Providing explicit solution algorithms did not improve performance, with collapse occurring at the same complexity threshold regardless of guidance.
- Critics argue the Tower of Hanoi tests measure algorithm-following, not reasoning, and that models strategically refuse hundreds of sequential steps rather than failing to think.


### [The Model Said So](http://philippdubach.com/posts/the-model-said-so/)
Published: 2025-05-28
Description: LLMs excel at parsing market sentiment and writing reports, but finance demands audit trails and explainable decisions that black box models cannot provide.

Summary: LLMs make your life easier until they don't.
Their intrinsic complexity and lack of transparency pose significant challenges, especially in the highly regulated financial sector
Related The Most Expensive Assumption in AI Unlike other industries where 'the model said so' might suffice, finance demands audit trails, bias detection, and explainable decision-making—requirements that sit uncomfortably with neural networks containing billions of parameters. The research highlights a fundamental tension that's about to reshape fintech: the same complexity that makes LLMs powerful at parsing market sentiment or generating investment reports also makes them regulatory nightmares in a sector where you need to explain every decision to examiners.
Key findings:
- LLMs with billions of parameters cannot produce the audit trails, bias detection, or explainable decisions that financial regulators require for every consumer-facing outcome.
- The same complexity that makes LLMs effective at parsing market sentiment makes them regulatory nightmares in a sector where 'the model said so' is not an acceptable answer.
- Hybrid approaches pairing LLMs with deterministic rule-based systems are emerging, but the fundamental tension between model power and regulatory transparency remains unresolved.


### [Trading on Market Sentiment](http://philippdubach.com/posts/trading-on-market-sentiment/)
Published: 2025-02-20
Description: GPT-3.5 matched RavenPack's 41% returns in a sentiment analysis trading strategy using 2,072 news headlines. See the full backtest results and comparison.

Summary: This post is based in part on a 2022 presentation I gave for the ICBS Student Investment Fund and my seminar work at Imperial College London.
As we were looking for new investment strategies for our Macro Sentiment Trading team, OpenAI had just published their GPT-3.5 Model. After first experiments with the model, we asked ourselves: How would large language models like GPT-3.5 perform in predicting sentiment in financial markets, where the signal-to-noise ratio is notoriously low? And could they potentially even outperform industry benchmarks at interpreting market sentiment from news headlines? The idea wasn't entirely new. Studies [2] [3] have shown that investor sentiment, extracted from news and social media, can forecast market movements. But most approaches rely on traditional NLP models or proprietary systems like RavenPack. With the recent advances in large language models, I wanted to test whether these more sophisticated models could provide a competitive edge in sentiment-based trading. Before looking at model selection, it's worth understanding what makes trading on sentiment so challenging. News headlines present two fundamental problems that any robust system must address. First, headlines are inherently non-stationary. Unlike other data sources, news reflects the constantly shifting landscape of global events, political climates, economic trends, etc. A model trained on COVID-19 vaccine headlines from 2020 might struggle with geopolitical tensions in 2023. This temporal drift means algorithms must be adaptive to maintain relevance. Second, the relationship between headlines and market impact is far from obvious. Consider these actual headlines from November 2020: 'Pfizer Vaccine Prevents 90% of COVID Infections' drove the S&P 500 up 1.85%, while 'Pfizer Says Safety Milestone Achieved' barely moved the market at -0.05%. The same company, similar positive news, dramatically different market reactions.
Key findings:
- GPT-3.5 returned 41.02% vs RavenPack's 40.99% on 2,072 Dow Jones Newswire headlines from 2018-2022, matching the commercial benchmark at a fraction of the cost
- Both sentiment strategies underperformed buy-and-hold (58.13%) in the full bullish period, but outperformed during the volatile 2020-2022 window (22.83% vs 21.00%)
- The two models showed a 0.59 sentiment score correlation, agreeing on direction but differing in granularity because GPT provides continuous scores while traditional NLP gives discrete labels
- Real-world deployment faces a latency problem: GPT needs seconds to score a headline, while HFT firms act on news within milliseconds


## Investing (17 articles)

### [AI Models Are the New Rebar](http://philippdubach.com/posts/ai-models-are-the-new-rebar/)
Published: 2026-03-11
Description: Qwen 3.5-35B runs on a gaming PC and matches Claude Sonnet 4.5. When the commodity version is 95% as good and 97% cheaper, you have a pricing problem.

Summary: Qwen 3.5-35B-A3B, a model released by Alibaba in February 2026, runs on a single consumer GPU with 24 gigabytes of VRAM. A secondhand RTX 4090, available for around $2,000, generates 60 to 100 tokens per second with it. On select benchmarks per Alibaba's own evaluations, it matches or beats Claude Sonnet 4.5. The Qwen 3.5 Flash tier costs $0.10 per million input tokens through Alibaba's API. Claude Sonnet 4.5 costs $3.00.
That's a 97 percent discount. For comparable performance.
Key findings:
- Qwen 3.5-35B matches Claude Sonnet 4.5 on select benchmarks at $0.10 per million input tokens versus $3.00, a 97 percent cost gap for comparable performance.
- The performance gap between open-source and proprietary AI models shrank from 8 percent to 1.7 percent in a single year, per the Stanford HAI 2025 AI Index.
- OpenAI's adjusted gross margin fell from 40 to 33 percent in 2025 as inference costs quadrupled to $8.4 billion, while the company lost $13.5 billion in H1 2025.
- AI inference prices decline at a median rate of 50x per year for equivalent performance, according to Epoch AI, a pace that dwarfs Moore's Law.


### [AI Capex Arms Race: Who Blinks First?](http://philippdubach.com/posts/ai-capex-arms-race-who-blinks-first/)
Published: 2026-03-08
Description: Alphabet's free cash flow is on track to fall 90% in 2026. Amazon's is at $11B. $690B in AI capex is cannibalizing the cash that justified these valuations.

Summary: Alphabet's free cash flow is projected to fall roughly 90% in 2026. Not because the business is in trouble. Because the company has committed to spending $83–93 billion more on capital expenditure than it did last year.
That is what $660–690 billion in AI capex looks like up close. Amazon guided to $200 billion alone. Meta's long-term debt more than doubled to $58.7 billion to help finance its share. Goldman Sachs projects cumulative 2025–2027 spending across the Big 4 at $1.15 trillion, more than double the $477 billion spent over the prior three years combined. BofA credit strategists found this will consume 94% of operating cash flow minus dividends and buybacks.
Key findings:
- The Big 4 hyperscalers are on track to spend $610–665 billion in 2026, roughly 70% above 2025 levels, with Goldman Sachs projecting cumulative 2025–2027 spend at $1.15 trillion
- Alphabet's free cash flow may fall from $73 billion to roughly $8 billion in 2026 as capex doubles; Amazon's is already compressed to $11 billion TTM with $200B guidance ahead
- Direct AI revenue covers roughly 15% of AI-specific capex: Sequoia's David Cahn calculated the ecosystem needs $600 billion in annual revenue to justify current infrastructure spending, against the roughly $50–100 billion it actually generates
- Inference costs are falling 50–200x per year (Epoch AI), meaning existing GPU infrastructure may become stranded faster than depreciation schedules assume


### [Every Bulge Bracket Bank Agrees on AI](http://philippdubach.com/posts/every-bulge-bracket-bank-agrees-on-ai/)
Published: 2026-03-01
Description: I read 12 AI research reports from Goldman Sachs, JPMorgan, UBS, and 6 other banks. Here's the consensus they're pushing, and what they're not saying.

Summary:  I spent the last week reading 12 bank AI research reports from nine of the world's largest financial institutions: Goldman Sachs, JPMorgan, Morgan Stanley (three separate reports), UBS, Barclays, Bank of America, HSBC, Citi, Deutsche Bank, and Santander. I wanted to understand how institutions that collectively manage trillions of dollars and employ thousands of analysts actually see this technology heading into 2026: where they agree, where they diverge, and what they're being less than forthcoming about.
Key findings:
- Not a single report from any of the nine institutions recommends reducing AI exposure. The absence of a bearish voice is itself the most important signal in the entire collection
- The macro productivity estimates span from +0.7% to +15% TFP over ten years, using the same underlying academic papers, cherry-picked to support nine different commercial narratives
- Only ~10% of US companies are productively using AI and 42% have abandoned GenAI projects. The gap between capex commitment and actual adoption is the most underweighted risk in the consensus
- AI capex already contributed 1.4–1.5 percentage points to US GDP growth in H1 2025, making infrastructure spending the dominant driver of US economic expansion in that period
- Morgan Stanley's historical data shows second-order beneficiaries outperform first-order enablers by 10–100x over long horizons, yet nearly every bank's current positioning favours first-order plays anyway


### [The Absolute Insider Mess of Prediction Markets](http://philippdubach.com/posts/the-absolute-insider-mess-of-prediction-markets/)
Published: 2026-02-22
Description: A Google insider made $1.15M on Polymarket in 24 hours. Israeli soldiers bet classified strike timing. Why prediction markets need insider trading regulation.

Summary: Someone at Google, or close enough to Google, deposited $3 million into Polymarket on December 3, 2025, bet on 23 separate 'Google Year in Search' outcomes, got 22 right, and walked away with $1.15 million in profit in under 24 hours. One of those bets: that d4vd would be the most-searched person of 2025, purchased at roughly 5 cents when the market gave it a 0.2% probability.
The wallet, originally called AlphaRacoon, had previously made over $150,000 correctly predicting the exact launch window of Google's Gemini 3.0 in November 2025. As blockchain engineer Haeju Jeong, who first flagged the account, put it: this is a Google insider milking Polymarket for quick money. The wallet later changed its username to 0xafEe, which might be the most half-hearted attempt at anonymity since an MIT researcher Googled 'how sec detect unusual trade' before insider trading.
Key findings:
- A suspected Google insider went 22-for-23 on Polymarket, turning $3M into $4.15M in under 24 hours, and no U.S. regulator has acted on any of the three major cases since December 2025
- A Fed working paper found Kalshi's macro markets matched the actual FOMC rate outcome on the day before every meeting since 2022, outperforming traditional forecasting tools in certain windows
- Combined prediction market volume hit $44B in 2025, roughly 300x the level from early 2024, while the CFTC has brought zero insider trading enforcement actions
- Unchecked insider trading triggers Akerlof's lemons dynamic: market makers widen spreads, uninformed participants leave, and the accuracy gains from one insider's trade are offset by the liquidity collapse that follows


### [The SaaSpocalypse Paradox](http://philippdubach.com/posts/the-saaspocalypse-paradox/)
Published: 2026-02-13
Description: AI capex failure and AI replacing all software are mutually exclusive. Why the 2026 SaaSpocalypse is a $2 trillion pricing error, not an extinction event.

Summary:  The market is simultaneously pricing AI capex failure and AI destroying all software. Both cannot be true.
Anthropic released 11 open-source plugins for Claude Cowork on January 30. Apache-2.0 licensed, file-based, running in a macOS-only research preview. Within a week, the IGV software ETF had fallen 32% from its September peak to a 52-week low of $79.65, roughly $2 trillion in market cap had evaporated, and hedge funds had made $24 billion shorting the sector. The RSI hit 18, the most oversold reading since 1990. JP Morgan titled their note 'Software Collapse Broadens with Nowhere to Hide.' Jefferies coined the term SaaSpocalypse. It was the worst software stock crash since the dot-com bust.
Key findings:
- The market is simultaneously punishing hyperscalers for weak AI capex returns and destroying software stocks because AI adoption is so strong it replaces all software, both cannot be true
- The IGV software ETF fell 32% to an RSI of 18, the most oversold reading since 1990, while the sector delivered 17% aggregate earnings growth and every major name beat Q4 2025 estimates
- Recurring-revenue software with 90%+ gross margins now trades at 32.4x forward earnings versus 43.6x for cyclical semiconductors, an 11.2x inversion that has not persisted historically
- Goldman Sachs projects the application software market growing to $780 billion by 2030, with a16z arguing AI expands the addressable market from $350 billion in software to $6 trillion in white-collar services


### [Buying the Haystack Might Not Work This Year](http://philippdubach.com/posts/buying-the-haystack-might-not-work-this-year/)
Published: 2026-01-31
Description: a16z sees AI fundamentals thriving with 80% GPU utilization. AQR sees the CAPE at the 96th percentile. Both have data. Both may be right.

Summary: I've been reading the January 2026 state of markets reports from Andreessen Horowitz and AQR, and their conclusions on the AI bubble question in 2026 are almost impossible to reconcile.
The a16z view is straightforward: AI fundamentals are real, and current prices reflect that reality. Their evidence is compelling. The top 50 private AI companies now generate $40.6 billion in annual revenue. Companies like ElevenLabs and Cursor are hitting $100 million ARR faster than Slack or Twilio ever did. GPUs are running at 80% utilization, compared to the 7% utilization rate for fiber optic cables during the dotcom bubble. This isn't speculation, they argue. It's demand exceeding supply. AQR looks at the same market and sees something else entirely. Their capital market assumptions put the U.S. CAPE ratio at the 96th percentile since 1980. Expected real returns for U.S. large cap equities over the next 5-10 years? 3.9%. For a global 60/40 portfolio, just 3.4%, well below the long-term average of roughly 5% since 1900. Risk premia, in their framework, are compressed across nearly every asset class. The narrative doesn't enter their models. a16z points to earnings growth. The market rally hasn't been driven by multiple expansion, they note, but by actual EPS growth. Tech P/E multiples sit around 30-35x, elevated but nowhere near the 70-80x of 2000. Tech margins have 'lapped the field' at 25%+ compared to 5-8% for the rest of the S&P 500. The fundamentals, they insist, are doing the work. AQR's response would be that fundamentals always look good near peaks. Their research shows a 50% probability that realized equity returns will miss estimates by more than 3 percentage points annually over the next decade. Compressed premia don't announce themselves with blaring headlines. They just quietly erode returns until investors notice they've been running in place.
Key findings:
- GPU utilization runs at 80% versus 7% for fiber optic cables during the dotcom era, but the U.S. CAPE ratio sits at the 96th percentile since 1980, historically associated with low future returns
- Cumulative hyperscaler capex is projected to hit $4.8 trillion by 2030, requiring roughly $1 trillion in annual AI revenue to clear a 10% hurdle rate
- Non-U.S. developed markets offer expected returns around 5% versus 3.9% for U.S. large caps, a valuation gap that holds even if the AI story is true
- AQR estimates a 50% probability that realized equity returns will miss estimates by more than 3 percentage points annually over the next decade


### [The Market Can Stay Irrational Longer Than You Can Stay Solvent](http://philippdubach.com/posts/the-market-can-stay-irrational-longer-than-you-can-stay-solvent/)
Published: 2026-01-11
Description: Steve Eisman explains how U.S. equity markets have structurally decoupled from everyday economic reality through concentration and passive investing.

Summary: A friend recently recommended Steve Eisman's podcast to me. Eisman, you might recall, is the hedge fund manager portrayed in The Big Short who famously bet against subprime mortgages before the 2008 crisis. In his most recent episode, Eisman laid out a thesis for something that made me uncomfortable ever since the Covid-19 stock market crash recovery: the U.S. equity market has structurally decoupled from everyday economic reality.
I've written about market concentration in my 2026 portfolio allocation. But Eisman's point isn't just about concentration. It's about what this concentration means for everyone else. Consider what happens to consumer-exposed sectors. Combined, healthcare, consumer discretionary, and consumer staples have fallen from 38% of the index in 2015 to just 25% today. This matters because roughly 70% of U.S. GDP is consumer-driven. The traditional logic was simple: consumer spending drives the economy, consumer stocks reflect that spending, and therefore the stock market reflects economic health. That relationship has broken down.
Key findings:
- Consumer-exposed sectors fell from 38% to 25% of the S&P 500 since 2015, even though 70% of U.S. GDP is consumer-driven, breaking the link between the index and everyday economic reality
- Index funds now control 60% of flows, buying mechanically in proportion to market cap with no price sensitivity, meaning corrections lose the stabilizing bid that active managers once provided
- With NVIDIA at 7.7%, Apple at 6.8%, and Microsoft at 6.1% of the index, most institutional mandates physically prevent managers from holding proportional positions due to risk limits
- Massive embedded capital gains create asymmetric liquidity: plenty of buyers on the way up, scarce ones on the way down, because selling triggers taxable events investors delay until forced


### [Praise by Name, Criticize by Category: Warren Buffett Retires at 95](http://philippdubach.com/posts/praise-by-name-criticize-by-category-warren-buffett-retires-at-95/)
Published: 2026-01-06
Description: Buffett exits after paying $26.8B in taxes. What 60 years of letters reveal about admitting mistakes, insurance float, and why Abel inherits $300B in cash.

Summary: Warren Buffett has stepped down as CEO at 95. Greg Abel inherits a company that paid $26.8 billion in federal income taxes last year, roughly 5% of what all of corporate America paid combined. I do not have much in common with Buffett, but I will miss his shareholder letters. Berkshire's archive is a rare case of a public company explaining decisions candidly to its owners.
In the 2024 letter Buffett repeats Tom Murphy's rule: 'Praise by name, criticize by category.' Murphy gave him this advice 60 years ago. The letter closes with another line worth keeping: 'Kindness is costless but priceless.'
Key findings:
- Berkshire paid $26.8B in federal income taxes last year, roughly 5% of all U.S. corporate tax receipts, built on an insurance float engine that generated $9B in underwriting profit and $13.7B in investment income in 2024
- Between 2019 and 2023, Buffett used 'mistake' or 'error' 16 times in shareholder letters, while many Fortune 500 companies never used either word once
- Apple at its peak represented 40-50% of Berkshire's public equity portfolio, meaning one stock bought mostly between 2016-2018 drove a substantial share of decade-long returns
- Abel inherits roughly $300B in cash and Treasury bills, not because Buffett prefers cash but because nothing available meets Berkshire's price discipline at current valuations


### [How AI is Shaping My Investment Portfolio for 2026](http://philippdubach.com/posts/how-ai-is-shaping-my-investment-portfolio-for-2026/)
Published: 2025-12-12
Description: Rebalancing for 2026: reducing S&P 500 at 40× CAPE, adding Europe after Germany's €1T pivot, and bonds at 4.2% yields. Full allocation rationale.

Summary: I have two portfolios: (a) long-term, diversified, low-cost ETFs, and (b) collecting diamonds in front of bulldozers, short-term option plays, and some individual stocks I find interesting. Here, we will only look at (a). This essay is structured along five themes I believe to be true for 2026:
(1) Market Concentration and High Valuations
(2) US Dollar Depreciation Expected Despite Continued Dominance
(3) AI Investment Remains Central But Requires Scrutiny
(4) European Fiscal Revolution Creates Investment Opportunities
(5) Fixed Income Offers Best Prospects Since Global Financial Crisis
Key findings:
- The S&P 500's Shiller CAPE ratio sits at 40.5, more than double its historical mean of 17.3, with the top 10 companies representing 45% of the index.
- Germany's historic fiscal pivot commits over 1 trillion euros to infrastructure and defense, narrowing the US-Europe growth gap from 60bps to 30bps while European equities trade at a 22% discount to global peers.
- AI capex is projected to reach $1.3 trillion by 2030 (3.8% of US GDP), exceeding every prior infrastructure boom including broadband, electricity, and the Apollo program.
- The US dollar remains roughly 10% overvalued per JP Morgan, with its reserve share declining from 71% in 1999 to 56% in 2025, supporting gold and currency-hedged international exposure.


### [Not Logan Roy: Netflix vs. Paramount's Bidding War](http://philippdubach.com/posts/not-logan-roy-netflix-vs.-paramounts-bidding-war/)
Published: 2025-12-09
Description: Netflix's $72B Warner Bros deal vs Paramount's hostile $30/share tender. Deal mechanics, aggregation theory, and why internet distributors win streaming.

Summary: In the HBO series Succession, billionaire Logan Roy's children spent four seasons scheming, backstabbing, and making offers to inherit a media empire. This week, the real version played out with more zeros and a $252 billion Oracle stake. Time for a closer look:
On Friday, Warner Bros. Discovery's board agreed to sell the company to Netflix for $72 billion. By Monday, Paramount had launched a hostile tender offer directly to shareholders at $30 per share, all cash. In this post I will be going into the gap between those two numbers, streaming economics, aggregator theory, and hostile deal mechanics. The Netflix offer breaks down into three pieces: $23.25 per share in cash, $4.50 per share in Netflix stock subject to a collar, and shares in a spun-off entity called Discovery Global containing CNN and the cable networks that Netflix doesn't want. Analysts value that stub somewhere between $2 and $5 per share, which puts the total package at roughly $29.75 to $32.75. Paramount is offering $30 per share in cash for the entire company, including the cable assets. Warner's stock closed Friday at $26.08 and opened Monday around $27.64, which tells you the market expects a bidding war but isn't fully convinced either deal closes.
Key findings:
- Netflix is acquiring Warner Bros for $72B in equity value while Paramount launched a hostile $30/share all-cash tender offer, backed by $54B in debt and Larry Ellison's $252B Oracle stake.
- Netflix commands a $425B market cap because internet distribution has zero marginal cost, while combined legacy studios are worth a fraction, a textbook case of aggregation theory.
- If Warner shareholders take Paramount's offer, Warner owes Netflix a $2.8B breakup fee; if Netflix's deal collapses, Netflix owes $5.8B, one of the largest reverse breakup fees on record.


### [Nike's Crisis and the Economics of Brand Decay](http://philippdubach.com/posts/nikes-crisis-and-the-economics-of-brand-decay/)
Published: 2025-12-02
Description: Nike lost $28B by weakening product development, athlete partnerships, and marketing simultaneously. Data-driven analysis of how complementary assets collapse.

Summary: Nike's $28 Billion Value Destruction What it sounds like is that the CEO has the wrong people making the wrong decisions across the strongest brand or one of the strongest brands in consumer history.
This quote by Scott Galloway on his podcast is from July 2024. In March 2025, Nike reported its worst revenue decline in nearly five years: an 11.5% drop to $11.01 billion. Digital sales fell 20%, app downloads decreased 35%, and store foot traffic declined 11%. Nike's crisis reveals how competitive advantages work, and how quickly they can disappear when the company that once captured roughly half of the US athletic footwear market systematically weakens its own foundations.
Key findings:
- Nike terminated hundreds of wholesale accounts to capture 50% direct margins over 30-35% wholesale margins, but competitors On and Hoka immediately filled the vacated shelf space and grew from a combined $682M to $3.2B in revenue between 2020 and 2025.
- On grew from $330M to $1.8B revenue and Hoka from $352M to $1.4B between 2020 and 2025, exploiting the product and distribution gap Nike created under CEO John Donahoe.
- Nike's gross margins compressed 190 basis points to 42.7% as the direct-to-consumer shift, organizational restructuring, and marketing pivot destroyed three complementary assets simultaneously.
- Nike manufactures 95% of shoes in Southeast Asia and faces $1B to $1.5B in additional tariff costs, compounding a crisis rooted in strategy rather than trade policy.


### [Michael Burry's $379 Newsletter](http://philippdubach.com/posts/michael-burrys-379-newsletter/)
Published: 2025-11-28
Description: Michael Burry launches Substack warning AI markets mirror 1999. His Nvidia-Cisco comparison, the GPU depreciation debate, and what hyperscalers need to justify capex.

Summary: Michael Burry (who in your head probably looks like Christian Bale thanks to The Big Short), the investor who famously predicted the 2008 housing crash, has launched a Substack newsletter after deregistering his hedge fund. The $379 annual subscription capitalizes on the 1.6 million followers he's built on X, offering what he describes as his 'sole focus' going forward.
The newsletter's inaugural post takes (which he kindly enough made accessible for free as a Thanksgiving gift today) readers back to 1999, when Burry was a 27-year-old neurology resident at Stanford making $33'000 annually while carrying $150'000 in medical school debt. There he wrote his Valuestocks.net article 'Buffett Revisited'. A fellow resident casually mentioned making $1.5 million on Polycom stock. Physicians crowded around terminals checking stocks while patients waited. In that environment, Burry was writing investment analysis late at night, getting paid $1 per word by MSN Money under the pen name 'Value Doc.' His VSN Fund returned 68.1% in 1999, and by February 2000, the San Francisco Chronicle noted he had shorted Amazon. Fourteen days after that article appeared, the NASDAQ topped. It was a peak it wouldn't revisit for 15 years.
Key findings:
- Burry argues Nvidia is Cisco circa 2000: the picks-and-shovels supplier at the center of an infrastructure cycle built on demand forecasts that may not materialize
- If hyperscalers shortened GPU depreciation from 6 years to 3, companies like Alphabet would take roughly a 10% hit to net profit, exposing weaker economics behind AI capex
- At $90B+ annual AI capex with 5-year depreciation and 10% WACC, Alphabet needs about $40B per year in incremental AI-attributable revenue just to break even on infrastructure spend
- One key difference from 2000: Cisco's forward P/E was around 200 at its peak, while Nvidia's is under 40


### [Everything is a DCF Model](http://philippdubach.com/posts/everything-is-a-dcf-model/)
Published: 2025-10-19
Description: Michael Mauboussin's argument that every cash-generating asset is valued through a DCF model. Why this Morgan Stanley paper changed how I think about value.

Summary: A brilliant piece of writing from Michael Mauboussin and Dan Callahan at Morgan Stanley that was formative in what I personally believe when it comes to valuation.
[…] we want to suggest the mantra 'everything is a DCF model.' The point is that whenever investors value a stake in a cash-generating asset, they should recognize that they are using a discounted cash flow (DCF) model. […] The value of those businesses is the present value of the cash they can distribute to their owners. This suggests a mindset that is very different from that of a speculator, who buys a stock in anticipation that it will go up without reference to its value. Investors and speculators have always coexisted in markets, and the behavior of many market participants is a blend of the two.
Key findings:
- Mauboussin's core claim: every valuation of a cash-generating asset is implicitly a DCF model, whether the investor builds one explicitly or not.
- The framework draws a clean line between investors (who price future cash flows) and speculators (who buy in anticipation of price increases without reference to value).
- DCF does not apply to gold, art, or crypto because these assets produce no cash flows, meaning their prices are set purely by supply and demand.


### [Gambling vs. Investing](http://philippdubach.com/posts/gambling-vs.-investing/)
Published: 2025-05-30
Description: Kalshi's CFTC-regulated exchange now offers sports betting nationwide. If you can bet on oil futures, why not NFL touchdowns? The line is thinner than you think.

Summary: Kalshi, a prediction market startup, is using its federal financial license to offer sports betting nationwide, even in states where it's not legal. The move has earned them cease-and-desist letters from state gaming regulators, but CEO Tarek Mansour isn't backing down:
We can go one by one for every financial market and it would fall under the definition of gambling. So what's the difference?
It's a question that cuts to the heart of modern finance. The founders argue that Wall Street blurred the line between investing and gambling long ago, and casting Kalshi as the latter is inconsistent at best. They have a point—if you can bet on oil futures, Nvidia's stock price, or interest rate movements, why is wagering on NFL touchdowns more objectionable?
Key findings:
- Kalshi uses its federal CFTC license to offer sports betting nationwide, even in states where gambling is illegal, arguing federal financial regulation preempts state law.
- 79% of Kalshi's recent trading volume is sports-related, making the distinction between 'prediction market' and 'sportsbook' largely semantic.
- A Kalshi board member is awaiting confirmation to lead the CFTC, the very agency that previously challenged the platform's legality.


### [Passive Investing's Active Problem](http://philippdubach.com/posts/passive-investings-active-problem/)
Published: 2025-02-15
Description: Research shows passive investing makes markets more volatile as index fund growth amplifies each trade's price impact while active managers lag behind.

Summary: (1) A new academic paper suggests the rise of passive investing may be fueling fragile market moves. (2) According to a study to be published in the American Economic Review, evidence is building that active managers are slow to scoop up stocks en masse when prices move away from their intrinsic worth. (3) Thanks to this lethargic trading behavior and the relentless boom in benchmark-tracking index funds, the impact of each trade on prices gets amplified, explaining how sell orders can induce broader equity gyrations
Key findings:
- Research to be published in the American Economic Review finds that passive fund growth amplifies each individual trade's price impact because active managers are too slow to arbitrage mispricings
- When most capital is on autopilot through index funds, the few remaining active traders exert disproportionate influence, turning ordinary sell orders into broader market swings
- Passive investing's core benefits still hold for most investors, but the increasingly passive market structure has unintended systemic consequences during periods of stress


### [Crypto Mean Reversion Trading](http://philippdubach.com/posts/crypto-mean-reversion-trading/)
Published: 2024-11-11
Description: How I built a crypto mean reversion trading bot using PELT change point detection on Kraken, targeting altcoin price overreactions with automated execution.

Summary: In late 2021, Lars Kaiser's paper on seasonality in cryptocurrencies inspired me to use my Kraken API Key to try and make some money. A quick summary of the paper: (1) Kaiser analyzes seasonality patterns across 10 cryptocurrencies (Bitcoin, Ethereum, etc.), examining returns, volatility, trading volume, and spreads (2) Finds no consistent calendar effects in cryptocurrency returns, supporting weak-form market efficiency (3) Observes robust patterns in trading activity - lower volume, volatility, and spreads in January, weekends, and summer months (4) Documents significant impact of January 2018 market sell-off on seasonality patterns (5) Reports a 'reverse Monday effect' for Bitcoin (positive Monday returns) and 'reverse January effect' (negative January returns) (6) Trading activity patterns suggest crypto markets are dominated by retail rather than institutional investors.
Key findings:
- The bot bought altcoins on Kraken when prices dropped more than 4 standard deviations over a 2-hour window, then sold automatically after 2 hours, betting on mean reversion
- PELT change point detection identified structural breaks in ETH price series, providing signal confirmation for when statistical properties of the time series shifted
- Major cryptos like BTC and ETH are becoming more efficient, but smaller altcoins with thin order books and retail-dominated trading still exhibit exploitable mean reversion patterns


### [My First 'Optimal' Portfolio](http://philippdubach.com/posts/my-first-optimal-portfolio/)
Published: 2024-03-15
Description: How I built Python portfolio optimization tools, tripled the Sharpe ratio from 0.65 to 1.68, and published the results as an academic paper on MPT.

Summary: My introduction to quantitative portfolio optimization happened during my undergraduate years, inspired by Attilio Meucci's Risk and Asset Allocation and the convex optimization teachings of Diamond and Boyd at Stanford. With enthusiasm and perhaps more confidence than expertise, I created my first 'optimal' portfolio. What struck me most was the disconnect between theory and accessibility. Modern Portfolio Theory had been established since 1990, yet the optimization tools remained largely locked behind proprietary software.
Key findings:
- Mean-variance optimization tripled the Sharpe ratio from 0.65 to 1.68 while cutting volatility from 14.4% to 5.6% at the same 9.4% return
- Out-of-sample testing across the 2018 bear market and 2019 bull market showed consistent CVaR reduction and improved risk-adjusted returns
- The project was published as an academic paper to fill the gap between established MPT theory and the lack of accessible open-source Python optimization tools at the time


## Quantitative Finance (7 articles)

### [Three Kinds of Not-Knowing](http://philippdubach.com/posts/three-kinds-of-not-knowing/)
Published: 2026-03-12
Description: Knightian uncertainty splits not-knowing into risk, uncertainty, and ignorance. A century after Knight and Keynes, most of investing still ignores the split.

Summary: Investing at the Edge of Knowledge, Part 1
David Ricardo made a fortune buying British government bonds four days before the Battle of Waterloo. He was not a military analyst. He had no basis to compute the odds of Napoleon's defeat, or victory, or any of the ambiguous outcomes in between. But he understood something that most of his contemporaries did not: the nature of his own ignorance was the same as everyone else's, the seller was desperate, competition was thin, and the pounds he'd gain if Wellington won were worth far more than the pounds he'd lose if Wellington fell.
Key findings:
- Zeckhauser's 2006 framework splits not-knowing into risk (known distributions), uncertainty (unknown probabilities), and ignorance (undefined states): most of finance covers only the first.
- Buffett wrote a $1.5B California earthquake reinsurance policy that the capital markets couldn't place because institutional models required probability estimates nobody had.
- The IGV software ETF fell 32% while sector earnings grew 17% in early 2026 because investors couldn't define the possible states of AI disruption, not just estimate their probabilities.
- Knight and Keynes independently argued in 1921 that not all uncertainty reduces to calculable probability, but the discipline chose formalization and both arguments lost for a century.


### [Long Volatility Premium](http://philippdubach.com/posts/long-volatility-premium/)
Published: 2026-02-14
Description: One River's data shows beta-adjusted long volatility outperformed the S&P 500 over 40 years. Goldman, AQR, and Universa agree on the mechanism but disagree on implementation. A synthesis of the evidence.

Summary:  The real value of tail hedging is not in the hedge itself. It's in what the hedge enables.
In The Variance Tax I wrote about the ½σ² formula: compound returns equal arithmetic returns minus half the variance, and because the penalty is quadratic, large drawdowns destroy wealth in ways that are hard to recover from. A portfolio that falls 50% needs 100% just to break even. That piece was about the problem. This one is about a potential solution, and about whether paying for crash protection can actually improve total returns rather than drag them.
Key findings:
- One River's 40-year data shows a beta-adjusted long volatility overlay improved S&P 500 total returns while reducing drawdowns, because neutralizing the put's short-delta isolates convexity that pays off in crashes
- A 3.3% allocation to Universa with the rest in the S&P 500 compounded at 12.3% annually over 10 years, beating the index by truncating the variance tax on compound returns
- AQR finds puts and trend-following are complementary: puts returned over 42% during the sudden COVID crash while trend-following excelled in protracted bear markets like the dot-com bust
- Several popular tail-risk strategies including short-dated VIX futures underperformed a simple cash allocation by 355 basis points, proving implementation matters more than the concept


### [Variance Tax](http://philippdubach.com/posts/variance-tax/)
Published: 2026-02-06
Description: Variance drain is the hidden cost of volatility: why a portfolio averaging +10% can lose money. The ½σ² formula explains the gap between paper and real returns.

Summary: Let's say your portfolio returned +60% in 2024, then fell 40% in 2025. That's an annualized average return of +10%. Actual return after two years: minus 4% (i.e $100 * 1.6 * 0.6 = $96).
That 14-point gap is what we call the variance tax aka variance drain or volatility drag and it's one of the least intuitive forces in investing.
Take any series of returns with arithmetic mean μ and volatility σ. The compound growth rate, the one that actually determines your wealth, is approximately:
Key findings:
- Variance drain equals ½σ²: doubling volatility quadruples the cost to compound returns
- The Kelly criterion (L* = (μ-r)/σ²) falls directly out of the variance drain formula, giving the leverage that maximizes compound growth
- Half-Kelly sizing sacrifices ~25% of theoretical growth but dramatically reduces drawdown risk from estimation error
- Same 10% arithmetic return at 50% vol loses more than half your money over 30 years; at 0% vol it reaches $1,745


### [Is Private Equity Just Beta With a Lockup?](http://philippdubach.com/posts/is-private-equity-just-beta-with-a-lockup/)
Published: 2026-01-29
Description: AQR's 2026 data shows private equity returning 4.2% versus 3.9% for public equities. The 30bp illiquidity premium barely justifies years of lockup.

Summary: The pitch used to be simple: accept illiquidity, get rewarded. Lock up your capital for seven years, tolerate capital calls and J-curves, and in exchange you'd earn returns that public markets couldn't touch. It was the defining bargain of institutional investing for two decades.
AQR's latest capital market assumptions make for uncomfortable reading if you're an allocator to private markets. Their expected real return for U.S. buyouts over the next 5-10 years is 4.2%. For U.S. large cap public equities, it's 3.9%. That's a 30 basis point premium for accepting years of lockup, unpredictable capital calls, limited transparency, and the very real risk of picking the wrong manager. Private credit looks even worse. Expected returns dropped 0.5 percentage points year over year as spreads narrowed and base rates came down. The asset class that was supposed to be the sensible alternative to stretched equity valuations now offers less compensation than it did twelve months ago.
Key findings:
- AQR's 2026 assumptions show U.S. buyouts returning 4.2% versus 3.9% for public equities, a 30bp premium that barely justifies years of lockup and manager selection risk
- Venture capital dispersion is extreme: top decile managers earn 31.7% IRR while bottom decile return negative 7%, meaning average returns compress as capital floods in
- 87% of U.S. companies with over $100M in revenue are now private, and 55% of median value for 2020-2023 IPOs was created before going public, up from 12% for 2014-2019 IPOs
- Private credit expected returns dropped 0.5 percentage points year over year to 2.6%, offering less compensation than twelve months ago as spreads narrowed


### [Against All Odds: The Mathematics of 'Provably Fair' Casino Games](http://philippdubach.com/posts/against-all-odds-the-mathematics-of-provably-fair-casino-games/)
Published: 2026-01-25
Description: Statistical analysis of 20,000 crash game rounds verifies the 97% RTP claim. But 179 rounds per hour means expected losses exceed 500% of wagers hourly.

Summary: 
Gambling can be harmful and lead to significant losses. Participation is subject to local laws and age restrictions. Always gamble responsibly. Need help? Visit BeGambleAware.org
Crash games represent a category of online gambling where players place bets on an increasing multiplier that can 'crash' at any moment. The fundamental mechanic requires players to cash out before the crash occurs; successful cash-outs yield the bet amount multiplied by the current multiplier, while failure results in total loss of the wager.
Key findings:
- Statistical analysis of 20,000 crash game rounds confirms the 97% RTP claim: the estimated probability exponent is 1.98 versus a theoretical 2.0, within 2.2% accuracy
- At 179 rounds per hour with 16-second median intervals and a 3% house edge per round, players face expected losses exceeding 500% of amounts wagered per hour
- Monte Carlo simulations of 10,000 sessions across four strategies (1.5x to 5x cash-outs) confirm every single strategy produces negative expected returns
- The probability of reaching multiplier m before crashing equals 0.97/m, so a 2x target succeeds 48.5% of the time while 100x works just 1.1% of rounds


### [It Just Ain’t So](http://philippdubach.com/posts/it-just-aint-so/)
Published: 2025-06-15
Description: Are stock returns normally distributed? Formal normality tests reject this assumption for most equity indices, with major implications for risk management.

Summary:  It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so.
This (not actually) Mark Twain quote from The Big Short captures the sentiment of realizing that some foundational assumptions might be empirically wrong.
A recent article by Anton Vorobets that I came across in Justina Lee's Quant Newsletter presents compelling evidence that challenges one of the field's fundamental statistical assumptions, that asset returns follow normal distributions. Using 26 years of data from 10 US equity indices, he ran formal normality tests (Shapiro-Wilk, D'Agostino's K², Anderson-Darling) and found that the normal distribution hypothesis gets rejected in most cases. The supposed 'Aggregational Gaussianity' that academics invoke through Central Limit Theorem arguments? It's mostly wishful thinking enabled by small sample sizes. As Vorobets observes:
Key findings:
- Formal normality tests (Shapiro-Wilk, Anderson-Darling) reject the normal distribution hypothesis for most U.S. equity indices across 26 years of data
- Fat tails mean extreme price movements occur far more often than standard models predict, so portfolios built on Gaussian assumptions systematically underestimate downside risk
- The Central Limit Theorem defense for normality is mostly wishful thinking enabled by small sample sizes, not supported by large empirical datasets
- CVaR optimization with Monte Carlo simulation offers a practical alternative that accounts for the actual shape of left tails rather than assuming them away


### [Beyond Monte Carlo: Tensor-Based Market Modeling](http://philippdubach.com/posts/beyond-monte-carlo-tensor-based-market-modeling/)
Published: 2025-05-11
Description: UBS paper uses Transition Probability Tensors to bridge machine learning and arbitrage-free derivatives pricing, offering a faster alternative to Monte Carlo.

Summary: A fascinating new paper from Stefano Iabichino at UBS Investment Bank explores what happens when you take the attention mechanisms powering modern AI and apply them to Wall Street's most fundamental pricing problems, tackling what might be quantitative finance's most intractable challenge.
The problem is elegantly simple yet profound: machine learning models are great at finding patterns in historical data, but financial theory demands that arbitrage-free prices be independent of past information. As the authors put it:
Key findings:
- UBS's Transition Probability Tensors simulated 210 investment strategies across 100,000 market scenarios in 70 seconds, offering a tractable alternative to Monte Carlo.
- Machine learning models learn from historical data, but the First Fundamental Theorem of Finance requires arbitrage-free prices to be independent of past information.
- The tensor framework adapts dynamically to volatility regimes, shifting attention toward tail events during market stress, similar to attention mechanisms in transformers.


## Macro (9 articles)

### [People Live in Levels, Not Rates](http://philippdubach.com/posts/people-live-in-levels-not-rates/)
Published: 2026-02-28
Description: Prices rose 25% since 2020 and won't come back. The levels-vs-rates problem explains the vibecession, the Stewart-Thaler debate, and why nobody trusts economists.

Summary:  Economics doesn't take into account what's best for society. The goal of economics in a capitalist system is to make the most amount of money for your shareholders.
That's Jon Stewart, telling a Nobel laureate what his own field is about. On February 4, Stewart hosted Richard Thaler on 'The Weekly Show' to discuss behavioral economics. Thaler, the Chicago Booth professor who won the 2017 Nobel for his work on how real humans deviate from rational-agent models, spent 92 minutes patiently explaining things Stewart had already decided weren't true. Jason Furman, Harvard professor and former Obama CEA chair, called it 'the single worst interview I've ever done' (referencing his own 2024 Stewart appearance). That tweet hit 754,000 views. Jerusalem Demsas wrote the sharpest rebuttal, arguing Stewart 'has no idea what economics actually is.'
Key findings:
- Cumulative CPI is up ~25% since 2020 (groceries up 29.4%, housing up 30-45%) and none of it reverses when inflation falls to 2.4%
- Bottom-quartile wage growth collapsed from 7.5% to 3.5% in 2025, reversing pandemic-era compression that had closed a third of the post-1979 inequality gap
- Consumer sentiment is at the 3rd percentile of its historical range despite 4.4% GDP growth, not irrational pessimism but a measurement problem
- Stewart reinvented the carbon tax while arguing against the economist who proposed it, the economics communication problem in 92 minutes


### [Europe's $24 Trillion Payment Breakup Is Really a Bet on Infrastructure Arbitrage](http://philippdubach.com/posts/europes-24-trillion-payment-breakup-is-really-a-bet-on-infrastructure-arbitrage/)
Published: 2026-02-16
Description: The EuroPA alliance connected 130 million users across 13 countries overnight. But this isn't really about sovereignty. It's an infrastructure arbitrage exploiting a 100-120bps spread between card network fees and SEPA Instant rails, accidentally protected by the EU's own regulation.

Summary: 
On February 2, 2026, the European Payments Initiative signed a Memorandum of Understanding with the Alliance EuroPA, a consortium linking Spain's Bizum, Italy's Bancomat, Portugal's SIBS, and the Nordic Vipps MobilePay system. The deal connects 130 million users across 13 countries into a single interoperable payment network. Headlines framed it as Europe breaking up with Visa and Mastercard. The actual story is more interesting: Europe is attempting an infrastructure arbitrage that, if it works, could reprice how money moves across the continent.
Key findings:
- The EuroPA alliance connected 130 million users across 13 countries overnight, giving Wero the scale to challenge Visa and Mastercard's $4.7 trillion European transaction volume
- Card transactions cost European merchants up to 2% versus Wero's proposed 0.77%, a 100-120 basis point structural arbitrage because account-to-account payments skip the card network layer entirely
- The EU's 2015 interchange cap backfired: Visa and Mastercard shifted revenue to unregulated scheme fees that rose 33.9% between 2018 and 2022, nearly doubling the net merchant service charge
- Mastercard has over 900 million branded cards in EU circulation versus Wero's 47 million users, and German adoption sits at only 5% of transaction volume despite being the first launch country


### [Britain's Strategic Limbo](http://philippdubach.com/posts/britains-strategic-limbo/)
Published: 2026-01-28
Description: Britain faces strategic isolation: locked out of EU defense cooperation, unwilling to join Trump's coalition. The mid-Atlantic bridge has nowhere to land.

Summary: The UK is the country with no bloc.
At Davos, Britain refused to join Trump's Board of Peace, citing commitment to international law and rejection of the 'pay-to-play' model. France, Germany, Sweden, Norway made the same choice. The difference is that those countries have somewhere else to go. Britain doesn't.
The SAFE instrument, the EU's €150 billion fund for joint defense procurement, is designed explicitly for strategic autonomy. Strict 'Buy European' provisions limit non-EU subcontractors to 15-35% of contract value, phased out within two years. Canada, remarkably, negotiated access and now has preferential treatment on par with EU firms. The UK remains excluded.
Key findings:
- The EU's SAFE fund limits non-EU subcontractors to 15-35% of contract value, and the UK rejected participation over sovereignty concerns that mirror the logic of Brexit itself
- Canada negotiated SAFE access on par with EU firms while Britain remains excluded, illustrating that principles without alternatives is just isolation
- Procurement cycles last decades, so structural exclusion from European defense contracts now means the UK defense industrial base erodes with each passing year


### [The Rise of Middle Power Realism](http://philippdubach.com/posts/the-rise-of-middle-power-realism/)
Published: 2026-01-27
Description: At Davos 2026, Carney told allies to take down the signs of the liberal order. Middle powers are learning to navigate between giants without illusions.

Summary: At Davos 2026, Canadian Prime Minister Mark Carney delivered a speech that received something rare at these gatherings: a standing ovation. Carney told the assembled elites what they already knew but hadn't said aloud: the world is not in a 'transition' but a 'rupture.'
The speech drew on Václav Havel's 1978 essay The Power of the Powerless, specifically the parable of the greengrocer who displays the slogan 'Workers of the World, Unite!' in his shop window. The grocer doesn't believe the slogan. He displays it to signal submission, to live in harmony with the regime. Carney's application was pointed: for years, US allies have displayed the signs of the liberal international order, pretending the partnership was mutual, that rules mattered, that values were shared. Even as reality diverged.
Key findings:
- At Davos 2026, Carney declared the world has experienced 'a rupture, not a transition' and used Havel's greengrocer parable to argue that allies have been displaying signs of a liberal order that no longer exists
- Canada joined the EU's SAFE defense fund, a 150 billion euro procurement program, becoming the first non-European G7 nation with preferential access to European defense markets
- Canada secured a preliminary trade deal with China on 49,000 EVs at 6.1% tariff, compared to the 100% tariff the U.S. imposes, demonstrating the leverage that comes from diversified partnerships
- The EU threatened to deploy its Anti-Coercion Instrument against the United States during the Greenland crisis, the first time the bloc signaled willingness to trade-war its primary security guarantor


### [Big in Japan](http://philippdubach.com/posts/big-in-japan/)
Published: 2026-01-19
Description: Japan holds $5 trillion in foreign assets. With 30-year JGB yields now above 3%, the carry trade that defined Japanese investing faces new friction.

Summary: Japan holds roughly $5 trillion in foreign assets. The US alone accounts for ¥342 trillion in bonds and equities.
Japanese 30-year yields sat below 1% from 2019 through early 2024. They're now above 3%. The yield spread between developed market bonds and JGBs has collapsed from 400 basis points to roughly 100. The yen carry trade that defined Japanese institutional behavior since the 1990s, borrow cheap at home and invest abroad for yield, suddenly has added friction.
Key findings:
- Japan holds roughly $5 trillion in foreign assets and is the largest foreign holder of U.S. Treasuries at over $1.1 trillion
- Japanese 30-year government bond yields rose from below 1% through early 2024 to above 3%, collapsing the yield spread versus developed market bonds from 400 basis points to roughly 100
- The August 2024 yen carry trade unwind dropped the S&P 6% in three days, and that was just positioning adjustment, not actual repatriation of Japan's institutional foreign holdings
- Treasury market depth has deteriorated since 2020, meaning a sustained seller of size would arrive into a market less equipped to absorb flow than at any point since the GFC


### [Repo might be even bigger than we thought](http://philippdubach.com/posts/repo-might-be-even-bigger-than-we-thought/)
Published: 2026-01-13
Description: New OFR data reveals $12.6 trillion in daily repo exposures—$700 billion larger than previous estimates. The plumbing of modern money remains poorly understood.

Summary:  Finance is anthropological
That's Zoltan Pozsar, the Hungarian-American economist who mapped the plumbing of modern money before most people knew there was plumbing to map. When he said it to Bloomberg in 2019, he was trying to explain why repo markets (the overnight lending infrastructure that lubricates trillions in daily transactions) had just seized up in ways the Federal Reserve didn't anticipate.
I've written about Pozsar's work before, particularly his 'Bretton Woods III' thesis about the shifting role of the dollar. But his earlier research on shadow banking and repo markets feels increasingly relevant as we enter 2026. In December 2025, the Office of Financial Research published new data on the size of the U.S. repo market. The number: $12.6 trillion in average daily exposures. That's roughly $700 billion larger than previous estimates; a measurement error roughly the size of the entire Swiss banking system.
Key findings:
- New OFR data puts the U.S. repo market at $12.6 trillion in daily exposures, $700 billion larger than previous estimates, a measurement gap roughly the size of the Swiss banking system
- Bilateral repo accounts for $5 trillion of daily activity, roughly 40% of the market, and was essentially invisible to regulators until OFR transaction-level collection reached full implementation in July 2025
- The Fed's Standing Repo Facility hit record usage of $74.6B on December 31, 2025, while reserves fell to $2.8 trillion, their lowest in four years
- Only 61.8% of repo collateral is Treasuries, leaving substantial room for corporate bonds and agency MBS that can gap in value during stress


### [Pozsar's Bretton Woods III: Three Years Later [2/2]](http://philippdubach.com/posts/pozsars-bretton-woods-iii-three-years-later-2/2/)
Published: 2025-10-26
Description: Gold above $4,000, Treasury holdings below $7T, but the dollar still dominates 88% of FX volumes. What Pozsar's Bretton Woods III got right and wrong.

Summary: Start by reading Pozsar's Bretton Woods III: The Framework [1/2]
Now, what actually happened in the three years since Pozsar published the Bretton Woods III framework? (1) Dollar reserve diversification is happening, but gradual: Foreign central bank Treasury holdings declined from peaks exceeding $7.5 trillion to levels below $7 trillion. This represents steady diversification away from dollar-denominated assets, though not a dramatic collapse. (2) Gold has performed strongly: From roughly $1'900/oz when Pozsar published his dispatches to peaks above $4'000/oz today, gold has appreciated substantially, consistent with increased central bank gold buying and demand for 'outside money.' (3) Alternative payment systems are developing: Various nations continue building infrastructure for non-dollar trade settlement. While these systems remain in preliminary stages rather than fully operational alternatives to SWIFT, development timelines could speed up following specific triggering events. (4) The dollar itself has remained strong: Perhaps surprisingly given predictions of reserve currency decline, the dollar achieved its best performance against a basket of major currencies since 2015 in 2024. The DXY index (which tracks the dollar against major trading partners) fell about 11% this year, marking the end of this decade-long rally. (5) Commodity collateral is increasingly important: Research on commodities as collateral shows that under capital controls and collateral constraints, investors import commodities and pledge them as collateral. Higher collateral demands increase commodity prices and affect the inventory-convenience yield relationship.
Key findings:
- Foreign central bank Treasury holdings fell from $7.5T to below $7T while gold rose from $1,900 to above $4,000/oz, consistent with gradual reserve diversification away from dollar assets.
- Foreign ownership of U.S. Treasuries dropped from above 50% to 30%, but the dollar still dominates 88% of FX volumes, showing de-dollarization is real but slow.
- The dollar posted its best year since 2015 in 2024 before declining sharply in 2025, complicating any simple narrative of dollar collapse or reserve currency decline.
- Pozsar's most durable insight: central banks control the nominal domain but not the real domain, meaning supply-driven commodity inflation does not respond well to rate hikes.


### [Pozsar's Bretton Woods III: The Framework [1/2]](http://philippdubach.com/posts/pozsars-bretton-woods-iii-the-framework-1/2/)
Published: 2025-10-25
Description: How freezing Russian reserves sparked Bretton Woods III: Pozsar's framework on inside money, outside money, and the shift to a commodity-backed monetary order.

Summary: In March 2022, as Western nations imposed unprecedented sanctions following Russia's invasion of Ukraine, Zoltan Pozsar published a series of dispatches that would become some of the most discussed pieces in financial markets that year. The core thesis was stark: we were witnessing the birth of 'Bretton Woods III,' a fundamental shift in how the global monetary system operates. Nearly three years later, with more data on de-dollarization trends, commodity market dynamics, and structural changes in global trade, it's worth revisiting this framework.
Key findings:
- Freezing Russian reserves in 2022 introduced confiscation risk to assets previously considered risk-free, triggering a shift from inside money (Treasuries) to outside money (commodities, gold).
- Non-U.S. banks hold $16 trillion in dollar assets but lack access to the Fed's emergency facilities, creating structural vulnerability whenever dollars become scarce globally.
- Rerouting Russian oil from 2-week Baltic-to-Europe voyages to 4-month Asia routes tied up roughly 10% of global VLCC capacity and multiplied commodity financing demands.
- Perry Mehrling's four prices of money (par, interest, exchange rate, price level) form Pozsar's analytical backbone, with central banks able to manage the first three but not commodity-driven inflation.


### [Dual Mandate Tensions](http://philippdubach.com/posts/dual-mandate-tensions/)
Published: 2025-05-21
Description: New NBER paper shows optimal Fed policy should partially accommodate tariff inflation, exposing a fault line in the dual mandate when prices and jobs conflict.

Summary: Something interesting just happened at the National Bureau of Economic Research NBER
We study the optimal monetary policy response to the imposition of tariffs in a model with imported intermediate inputs. In a simple open-economy framework, we show that a tariff maps exactly into a cost-push shock in the standard closed-economy New Keynesian model, shifting the Phillips curve upward. We then characterize optimal monetary policy, showing that it partially accommodates the shock to smooth the transition to a more distorted long-run equilibrium—at the cost of higher short-run inflation.
Key findings:
- NBER research shows tariffs map directly into cost-push shocks, shifting the Phillips curve upward and forcing the Fed to choose between inflation and employment.
- Optimal monetary policy would partially accommodate tariff inflation, allowing prices to overshoot to smooth the transition rather than crushing output with rate hikes.
- The dual mandate was never designed for scenarios where price stability and maximum employment point in opposite directions simultaneously.


## Economics (6 articles)

### [When AI Labs Become Defense Contractors](http://philippdubach.com/posts/when-ai-labs-become-defense-contractors/)
Published: 2026-03-01
Description: The Anthropic-Pentagon standoff isn't an ethics story. It's a replay of the 1993 Last Supper that consolidated 51 defense primes into 5, at Silicon Valley speed.

Summary: Lockheed started by building Amelia Earhart's favorite plane. Then came a government loan guarantee in 1971 (the L-1011 TriStar nearly killed the company), a Cold War, decades of consolidation, and now a business that earns 92.5% of its revenue from government contracts, with the F-35 alone accounting for 26% of its $71 billion in annual sales. The process took about 50 years. AI labs becoming defense contractors will happen faster.
On February 27, 2026, two things happened within hours of each other. President Trump ordered every federal agency to 'IMMEDIATELY CEASE all use of Anthropic's technology' after CEO Dario Amodei refused to strip safety constraints from Claude's Pentagon deployment, specifically prohibitions on mass domestic surveillance and fully autonomous weapons. Defense Secretary Pete Hegseth then labeled Anthropic a 'Supply-Chain Risk to National Security,' a designation previously reserved for foreign adversaries like Huawei, never before applied to an American company. That evening, Sam Altman announced that OpenAI had signed a deal to deploy its models on the Pentagon's classified network, posting that the Department of War 'displayed a deep respect for safety.' (Whether that reflects the Pentagon's actual position or Altman's political optimism, remains unclear for now.)
Key findings:
- The FY2026 Pentagon AI budget jumped to $13.4 billion from $1.8 billion, a 7x increase in a single budget cycle, now larger than Anthropic's entire annualized revenue of $14 billion.
- After the 1993 Last Supper, 51 prime defense contractors collapsed into 5 within four years. AI labs face the same consolidation logic, just faster: through classified network access and government-funded compute rather than M&A.
- IDIQ contracts account for 56% of DoD award dollars and run five years with extensions. Once embedded in classified systems with a security-cleared workforce (243-day average clearance processing), switching costs become close to prohibitive.
- Palantir's trajectory previews the endgame: $4.48 billion FY2025 revenue (up 56%), 53.7% from government, now worth nearly twice Boeing at $320 billion market cap.


### [Economics of a Super Bowl Ad](http://philippdubach.com/posts/economics-of-a-super-bowl-ad/)
Published: 2026-02-20
Description: A 30-second Super Bowl spot costs $8M. The real price is $16–23M. The ROI evidence is mixed. A deep look at the pricing, the prisoner's dilemma, and the NFL.

Summary: A 30-second Super Bowl ad costs $8 million. That's $267,000 per second, roughly the median U.S. home price for every tick of the clock. Super Bowl LX drew 124.9 million average viewers with a peak of 137.8 million, the highest peak audience in American television history. The NFL accounted for 84 of the top 100 most-watched U.S. telecasts in 2025. The Oscars, by comparison, managed 19.7 million.
Ro (that's the name of the direct-to-patient telehealth company) CEO Zachariah Reitano, writing from direct experience as a 2026 Super Bowl advertiser, published a detailed cost breakdown based on his own spending and interviews with 10+ brands. The picture that emerges is considerably more expensive than the headline number. Production runs $1–4 million for studio, crew, and post-production before any famous face enters the frame. Celebrity endorsement talent adds $1–5 million, with the current A-list sweet spot at $3–5 million according to WME agent Tim Curtis. Then comes the companion buy: for every 30-second slot, advertisers are generally required to commit to spending an equivalent amount on other programs broadcast by the same network. For NBC's 2026 Super Bowl, that meant additional inventory across the Winter Olympics and NBA All-Star Game, adding another $7–10 million to the tab.
Key findings:
- A 30-second Super Bowl spot costs $8M but $16–23M fully loaded with production, talent, and mandatory companion buys
- Stanford research shows competing brands both advertising cancels out the benefit, a prisoner's dilemma the NFL exploits for rising prices
- The NFL is the last monoculture in American media: 84 of the top 100 most-watched US telecasts in 2025
- A single Super Bowl ad generates the same brand-search engagement as 1,056 typical primetime ads


### [Ozempic is Reshaping the Fast Food Industry](http://philippdubach.com/posts/ozempic-is-reshaping-the-fast-food-industry/)
Published: 2026-01-16
Description: Cornell research: GLP-1 users cut grocery spending 5.3%, fast food 8%. With 16% household adoption and savory snacks down 10%, food stocks face headwinds.

Summary: Something strange is happening in the food industry. New US dietary guidelines call for more protein and less sugar. Greggs, the UK bakery chain, just warned of 'flatlining profits' in the food-to-go market. Food companies are racing to overhaul their brands, ditching artificial dyes and packing protein into products. Earnings calls across the sector blame 'inflation' and 'subdued consumer confidence.' Nobody mentions the elephant in the room: GLP-1 medications.
New research from Cornell finally puts numbers to what the food industry doesn't want to discuss. Using transaction data from 150,000 households linked to survey responses on medication adoption, Sylvia Hristakeva, Jūra Liaukonytė, and Leo Feler tracked exactly how Ozempic and Wegovy users change their spending. The results deserve attention from anyone holding food stocks.
Key findings:
- Cornell research on 150,000 households shows GLP-1 users cut grocery spending 5.3% within six months, with fast food down 8.0% and savory snacks hit hardest at 10.1%
- 16.3% of U.S. households already have at least one GLP-1 user as of July 2024, with nearly half taking the medication for weight loss rather than diabetes
- About 34% of users discontinue GLP-1 medications, and when they stop, candy and chocolate purchases rise 11.4% above pre-adoption levels, suggesting the drugs suppress appetite without teaching new habits
- High-income households show steeper spending declines at 8.2%, and these are the most profitable fast food customers, creating a double loss of volume and margin


### [Does AI mean the demand on labor goes up?](http://philippdubach.com/posts/does-ai-mean-the-demand-on-labor-goes-up/)
Published: 2026-01-15
Description: AI was supposed to free us. The Jevons paradox plays out in real time: efficiency expands workload, not leisure. 77% of workers say AI added to their work.

Summary: Joe Weisenthal from Bloomberg, this week:
All my shower thoughts now are about designing efficient workflows for synthesizing, collecting, labeling and annotating data.
Same. Since I started building every app and tool I thought would make my life easier, my workflow more efficient, I haven't stopped. Apparently non-developers are now writing apps instead of buying them. This is the AI productivity paradox in miniature: the tools get better and we do more, not less.
Key findings:
- Workers in AI-exposed occupations now work roughly 3 extra hours per week, and leisure time has dropped by the same amount, according to NBER research
- 77% of employees say AI tools have added to their workload, not reduced it, per Upwork's survey data
- Only 21% of employees use time saved by AI for personal life, with the rest reinvesting it directly back into work
- The Jevons paradox from 1865 predicted this: more efficient steam engines increased coal consumption, and more efficient AI tools are increasing work output expectations the same way


### [Agent-based Systems for Modeling Wealth Distribution](http://philippdubach.com/posts/agent-based-systems-for-modeling-wealth-distribution/)
Published: 2025-08-30
Description: Agent-based modeling shows how random market transactions naturally produce extreme wealth concentration, and why even a small wealth tax changes everything.

Summary: A question Gary Stevenson, the self-proclaimed best trader in the world, has been asking for some time is if a wealth tax can fix Britain's economy.
[...] he believed the continued parlous state of the economy would halt any interest rate hikes. The reason? Because when ordinary people receive money, they spend it, stimulating the economy, while the wealthy tend to save it. But our economic model promotes the concentration of wealth among a select few at the expense of everybody else's living standards.
Key findings:
- The Affine Wealth Model matches 27 years of U.S. wealth data with less than 0.16% average error, showing extreme concentration emerges from random transactions alone.
- Even a 1% wealth tax shifts the simulated distribution from Pareto extremes to a stable equilibrium where top agents hold at most 3-4x their starting wealth.
- Roughly 10% of the U.S. population holds negative net worth, a feature the Affine Wealth Model captures by allowing agents to go below zero.


### [Behavioral Economics & Transit Policy](http://philippdubach.com/posts/behavioral-economics-transit-policy/)
Published: 2025-06-22
Description: The zero price effect explains why politicians love free transit proposals. But making buses free might weaken the rider advocacy that protects service quality.

Summary: Over the weekend a WSJ editorial on the 2025 New York City mayoral election called one of the potential Democratic candidates Zohran Mamdani 'a literal socialist' for - among other things - running on the promise of free bus rides for all:
Zohran won New York's first fare-free bus pilot on five lines across the city. As Mayor, he'll permanently eliminate the fare on every city bus [...] Fast and free buses will not only make buses reliable and accessible but will improve safety for riders and operators – creating the world-class service New Yorkers deserve.
Key findings:
- The zero price effect means the gap between $2.75 and free feels far larger than $2.75 and $0.75, even though the dollar savings is smaller.
- Free transit activates social norms like gratitude and civic participation, while any positive price forces cost-benefit thinking that makes riders demand accountability.
- Fare-paying riders create a natural constituency that defends transit budgets during cuts, meaning free transit could make it politically easier to slash service quality.
- Congestion pricing works because even a modest $5 charge shifts drivers from automatic habit to deliberate calculation about each trip.


## Medicine (5 articles)

### [Novo Was Europe's Most Valuable Company](http://philippdubach.com/posts/novo-was-europes-most-valuable-company/)
Published: 2026-02-23
Description: Novo Nordisk lost 75% since June 2024. CagriSema failed vs Zepbound, US pricing is resetting lower, and Lilly leads on every axis. Full breakdown with numbers.

Summary: Novo Nordisk was Europe's most valuable company 20 months ago. Today its market capitalization falls behind ASML, LVMH, Hermès, L'Oréal, SAP, Prosus, Siemens, Inditex, Deutsche Telekom, and Santander.
The stock has lost roughly 75% since its June 2024 peak of $142.44, falling from a $640 billion market cap to under $160 billion. Shares dropped another 16% this morning after CagriSema, the follow-on obesity drug that was supposed to restore Novo's competitive story, failed its head-to-head trial against Eli Lilly's Zepbound. The REDEFINE 4 results confirm what a former Novo advisor told AlphaSense back in December: CagriSema is 'not particularly impressive.'
Key findings:
- Novo Nordisk lost 75% since June 2024 ($640B to under $160B) and guided for its first revenue decline in modern history, with 2026 adjusted sales down 5-13%
- CagriSema lost its head-to-head against Zepbound in REDEFINE 4, trailing by 2.5 to 3.4 percentage points on weight loss, eliminating Novo's best competitive argument
- Semaglutide's compound patent lapsed in Canada after Novo missed a CAD 250 maintenance fee, and Dr. Reddy's has filed generics in 87 countries
- Novo trades at roughly 11x forward earnings after today's crash, a nearly 80% PE compression from its 2024 peak of ~50x, cheaper than every large-cap pharma peer including Pfizer


### [Ozempic is Reshaping the Fast Food Industry](http://philippdubach.com/posts/ozempic-is-reshaping-the-fast-food-industry/)
Published: 2026-01-16
Description: Cornell research: GLP-1 users cut grocery spending 5.3%, fast food 8%. With 16% household adoption and savory snacks down 10%, food stocks face headwinds.

Summary: Something strange is happening in the food industry. New US dietary guidelines call for more protein and less sugar. Greggs, the UK bakery chain, just warned of 'flatlining profits' in the food-to-go market. Food companies are racing to overhaul their brands, ditching artificial dyes and packing protein into products. Earnings calls across the sector blame 'inflation' and 'subdued consumer confidence.' Nobody mentions the elephant in the room: GLP-1 medications.
New research from Cornell finally puts numbers to what the food industry doesn't want to discuss. Using transaction data from 150,000 households linked to survey responses on medication adoption, Sylvia Hristakeva, Jūra Liaukonytė, and Leo Feler tracked exactly how Ozempic and Wegovy users change their spending. The results deserve attention from anyone holding food stocks.
Key findings:
- Cornell research on 150,000 households shows GLP-1 users cut grocery spending 5.3% within six months, with fast food down 8.0% and savory snacks hit hardest at 10.1%
- 16.3% of U.S. households already have at least one GLP-1 user as of July 2024, with nearly half taking the medication for weight loss rather than diabetes
- About 34% of users discontinue GLP-1 medications, and when they stop, candy and chocolate purchases rise 11.4% above pre-adoption levels, suggesting the drugs suppress appetite without teaching new habits
- High-income households show steeper spending declines at 8.2%, and these are the most profitable fast food customers, creating a double loss of volume and margin


### [GLP-1 Receptor Agonists in ASUD Treatment](http://philippdubach.com/posts/glp-1-receptor-agonists-in-asud-treatment/)
Published: 2025-11-21
Description: A phase 2 RCT shows low-dose semaglutide reduces alcohol craving and heavy drinking with effect sizes exceeding naltrexone. What GLP-1 means for AUD treatment.

Summary:  Alcohol and other substance use disorders (ASUDs) are complex, multifaceted, but treatable medical conditions with widespread medical, psychological, and societal consequences. However, treatment options remain limited, therefore the discovery and development of new treatments for ASUDs is critical. Glucagon-like peptide-1 receptor agonists (GLP-1RAs), currently approved for the treatment of type 2 diabetes mellitus, obesity, and obstructive sleep apnea, have recently emerged as potential new pharmacotherapies for ASUDs.
Semaglutide, the GLP-1 receptor agonist marketed as Ozempic and Wegovy, may be the most significant new pharmacotherapy candidate for alcohol use disorder in decades. This development matters most for people struggling with substance use disorders who have few effective treatment options. It also matters for manufacturers like Novo Nordisk facing patent expiration pressures on Ozempic. The research into GLP-1RAs for addiction treatment is early but notable given the limited pharmacotherapy options currently available for ASUDs. In February 2025, researchers at UNC published results from the first randomized controlled trial of semaglutide for ASUD treatment. The phase 2 trial enrolled 48 non-treatment-seeking adults with AUD and administered low-dose semaglutide (0.25 mg/week for 4 weeks, 0.5 mg/week for 4 weeks - standard dosing for weight loss reaches 2.4 mg per week) over 9 weeks. Participants on semaglutide consumed less alcohol in controlled laboratory settings and reported fewer drinks per drinking day in their normal lives. They also reported less craving for alcohol. Heavy drinking episodes declined more sharply in the semaglutide group compared to placebo over the nine-week trial. The mechanism likely involves GLP-1 receptors in the brain's mesolimbic reward pathway, where semaglutide modulates dopamine signaling to reduce the reinforcing effects of alcohol consumption. Despite the low doses, effect sizes for some drinking outcomes exceeded those typically seen with naltrexone, one of the few FDA-approved medications for alcohol use disorder. A large real-world study of 83,825 patients with obesity found semaglutide associated with a 50-56% lower risk of AUD incidence and recurrence compared to other anti-obesity medications. While larger trials are needed to confirm these results, the early evidence suggests GLP-1 may offer a meaningful treatment option for a condition where new therapies have been approved at a rate of roughly one every 25 years. Phase 3 trials evaluating semaglutide for AUD are now underway, and pemvidutide, a GLP-1/glucagon dual receptor agonist, has received FDA Fast Track designation for alcohol use disorder.
Key findings:
- A phase 2 RCT found low-dose semaglutide (0.25-0.5 mg/week, far below the 2.4 mg weight loss dose) reduced alcohol craving and heavy drinking episodes versus placebo.
- Effect sizes for some drinking outcomes exceeded those typically seen with naltrexone, one of only three FDA-approved medications for alcohol use disorder.
- New AUD therapies have been approved at a rate of roughly one every 25 years, making GLP-1 receptor agonists the most promising new pharmacotherapy class in decades.


### [Novo Nordisk's Post-Patent Strategy](http://philippdubach.com/posts/novo-nordisks-post-patent-strategy/)
Published: 2025-06-29
Description: Amycretin's Phase 1 data shows 24.3% weight loss, beating Wegovy and Zepbound. How Novo Nordisk plans to replace its $20B Ozempic franchise before 2031.

Summary: Novo Nordisk, a long time member of my 'regrets' stock list, has become reasonably affordable lately (-48% yoy). Part of the reason being that they currently sit atop a ~$20 billion Ozempic/Wegovy franchise that faces patent expiration in 2031. That's roughly seven years to replace their blockbuster drug. We revisit them today, since per newly published Lancet data, Novo's lead replacement candidate—amycretin—just posted some genuinely impressive Phase 1 results. The injectable version delivered 24.3% average weight loss versus 1.1% for placebo, beating both current market leaders (Wegovy at 15% and Lilly's Zepbound at 22.5%). Even the oral version hit 13.1% weight loss in just 12 weeks, with patients still losing weight when the trial ended.
Key findings:
- Amycretin delivered 24.3% average weight loss in Phase 1, beating both Wegovy at 15% and Zepbound at 22.5%, with dose-response curves that overlapped across groups.
- Novo Nordisk's core Ozempic patent expires December 2031, but complex peptide manufacturing and patented injection devices create a capacity constraint moat that generic competitors cannot quickly replicate.
- The oral amycretin version hit 13.1% weight loss in just 12 weeks with patients still losing weight when the trial ended.
- Novo is down 48% year-over-year, with Martin Shkreli's model putting fair value at 705 DKK, roughly 21% upside from current levels.


### [AlphaFold 3: Free for Science](http://philippdubach.com/posts/alphafold-3-free-for-science/)
Published: 2024-05-12
Description: Google released AlphaFold 3 for free, predicting molecular interactions beyond proteins. But is this generosity or a cloud platform play to own drug discovery?

Summary: Nothing says 'we're serious about dominating a market' quite like giving away breakthrough technology for free. Google's latest move with AlphaFold 3 might be their most audacious version of this strategy yet.
'AlphaFold 3 can predict the structure and interactions of all of life's molecules with unprecedented accuracy'
This isn't just an incremental improvement - While previous versions of AlphaFold could predict protein structures, AlphaFold 3 models the interactions between proteins, DNA, RNA, and small molecules. It's the difference between having a parts catalog and understanding how the entire machine works.
Key findings:
- AlphaFold 3 predicts molecular interactions across proteins, DNA, RNA, and small molecules with 50% better accuracy than prior methods, not just static protein structures
- Google released it free for academic use while keeping commercial rights through Isomorphic Labs, which has signed deals worth up to $3B with Eli Lilly and Novartis
- The strategy mirrors Google's classic platform play: build dependency on free infrastructure, then monetize the ecosystem once adoption is entrenched


## Tech (10 articles)

### [The Last Architecture Designed by Hand](http://philippdubach.com/posts/the-last-architecture-designed-by-hand/)
Published: 2026-03-16
Description: The transformer's limits are now mathematical proofs, not empirical hunches. Hybrids are in production. AI is searching for its own replacement. Here's what comes after.

Summary:  I bet there is another new architecture to find that is gonna be as big of a gain as transformers were over LSTMs.
Sam Altman, the CEO of the company most invested in the transformer is telling a room of students it isn't the final form. So what comes after the transformer? He's probably right that something will, and the evidence is no longer anecdotal. Several recent papers have proved that the transformer's worst properties are structural, not engineering problems to be fixed with better data or more compute, but mathematical lower bounds.
Key findings:
- Mathematical proofs now show that quadratic scaling, hallucination, and positional bias are structural properties of the transformer, not fixable with better training data or RLHF.
- Over 60% of frontier models released in 2025 use Mixture of Experts, and production hybrids like Jamba and Qwen3-Next blend attention with state space models at 3x throughput.
- AlphaEvolve found a 23% speedup inside Gemini's own architecture, cutting training time by 1% and recovering 0.7% of Google's total compute resources.
- OpenAI's inference spending hit $2.3 billion in 2024, 15x what they spent training GPT-4.5, meaning the economic center of gravity has already shifted from training to inference.


### [MCP vs A2A in 2026: How the AI Protocol War Ends](http://philippdubach.com/posts/mcp-vs-a2a-in-2026-how-the-ai-protocol-war-ends/)
Published: 2026-03-15
Description: MCP leads with 97M monthly SDK downloads and 10,000+ servers. A2A fills a different layer. Analysis of the agentic AI standards war with historical parallels.

Summary: On March 26, 2025, Sam Altman posted the following three sentences
people love MCP and we are excited to add support across our products.
MCP is Anthropic's Model Context Protocol. OpenAI is Anthropic's most direct competitor. Altman was endorsing a rival's standard. That post may be the most significant event in enterprise AI infrastructure this year. When your main competitor adopts your protocol, the war is close to over. I've been watching this play out since Anthropic launched MCP in November 2024, and I want to work through what's happening: who controls what, what 'interoperability' means in practice, and whether any of this follows patterns we've seen before.
Key findings:
- MCP reached 10,000+ servers and 97 million monthly SDK downloads before A2A launched, compounding a five-month head start into a structural ecosystem lead.
- OpenAI adopting MCP in March 2025 mirrors the iMac's USB-only bet in 1998: one player so central to the ecosystem that their adoption made the standard inescapable.
- The agentic AI market is $7-8 billion in 2025, with analyst projections ranging from $50 billion to $199 billion by 2034 at 40-50% annual growth.
- 53% of MCP servers still rely on static credentials rather than OAuth, and a critical npm package vulnerability (CVE-2025-6514) exposed 437,000+ installations to shell injection.


### [93% of Developers Use AI Coding Tools. Productivity Hasn't Moved.](http://philippdubach.com/posts/93-of-developers-use-ai-coding-tools.-productivity-hasnt-moved./)
Published: 2026-03-04
Description: METR found experienced developers 19% slower with AI, despite feeling 20% faster. At 92.6% adoption, organizational productivity gains remain roughly 10%.

Summary: A study published in July 2025 gave AI coding tools their most credible test yet. Sixteen experienced open-source developers, 246 real tasks, randomized controlled design. The researchers expected to measure how much faster AI made them. What they found: developers using AI took 19% longer to complete tasks than those working without it.
The developers themselves thought they were 20% faster.
That 39-point gap between perception and reality is the most important number in METR's paper. It lands inside two years of adoption data pointing in the opposite direction. DX surveyed 121,000 developers across 450+ companies and found 92.6% use AI coding tools at least monthly. JetBrains' AI Pulse measured 93%. The DORA 2025 report put it at 90%. On the productivity side: six independent research efforts converge on roughly the same ceiling, 10% at the system level, if you're being generous. 
Key findings:
- A randomized controlled study found experienced developers using AI took 19% longer to complete tasks while believing they were 20% faster, a 39-point perception gap.
- Writing code accounts for 25-35% of the software development lifecycle, so even a 100% coding speedup yields at most 15-25% system improvement under Amdahl's Law.
- Teams with high AI adoption merged 98% more pull requests but saw review time increase 91%, with DORA delivery metrics unchanged across 10,000+ developers.
- At 92.6% monthly adoption and 27% of production code AI-generated, six independent research efforts converge on roughly 10% organizational productivity gains.


### [Building a No-Tracking Newsletter from Markdown to Distribution](http://philippdubach.com/posts/building-a-no-tracking-newsletter-from-markdown-to-distribution/)
Published: 2025-12-24
Description: Build a privacy-focused newsletter with Python, Cloudflare Workers KV, and Resend API. Zero tracking, zero cost, full control. Open source code included.

Summary:  Friends have been asking how they can stay up to date with what I'm working on and keep track of the things I read, write, and share. RSS feeds don't seem to be en vogue anymore, apparently. So I built a mailing list. What else would you do over the Christmas break?
Key findings:
- The entire newsletter pipeline runs at zero cost using Cloudflare Workers KV for subscribers, R2 for hosting, and Resend's free tier for 3,000 emails per month.
- The system sends no tracking pixels, no click tracking, and no external analytics, just rendered HTML from Markdown with table-based layout for email client compatibility.
- A Python engine fetches OpenGraph metadata, generates LinkedIn-style preview cards, and optimizes images at 240px width for retina displays, all automated from a single .md file.


### [Deploying to Production with AI Agents: Testing Cursor on Azure](http://philippdubach.com/posts/deploying-to-production-with-ai-agents-testing-cursor-on-azure/)
Published: 2025-11-30
Description: Cursor AI deployed YOURLS on Azure via SSH in 15 minutes: server config, MySQL, SSL, and a custom plugin. Full transcript and tutorial available.

Summary: I've been curious about Cursor's capabilities for a while, but never had a good reason to try it. This weekend I decided to host my own URL shortener and deployed YOURLS, a free and open-source link shortener, on a fresh Azure VM. It seemed like a solid test case since it involves SSH access, server configuration, database setup, and SSL certificates. If an AI assistant could handle that end-to-end, it would be genuinely useful.
Key findings:
- Cursor deployed YOURLS on a fresh Azure VM via SSH in about 15 minutes, handling server config, MySQL, Apache, SSL, and a custom plugin with no manual intervention
- The same deployment previously took at least an hour of manual work and troubleshooting across multiple steps
- When asked to build a custom YOURLS plugin for date-prefixed short URLs, Cursor wrote it correctly on the first attempt, and the shortener is live at pdub.click


### [Visualizing Gradients with PyTorch](http://philippdubach.com/posts/visualizing-gradients-with-pytorch/)
Published: 2025-08-23
Description: Build the right mental model for gradients with this PyTorch visualization tool. 2D surface plots with gradient vectors show the direction of steepest ascent.

Summary: Gradients are one of the most important concepts in calculus and machine learning, but it's often poorly understood. Trying to understand them better myself, I wanted to build a visualization tool that helps me develop the correct mental picture of what the gradient of a function is. I came across GistNoesis/VisualizeGradient, so I went on from there to write my own iteration. This mental model generalizes beautifully to higher dimensions and is the foundation for understanding optimization algorithms like gradient descent. The colored surface shows function values. Black arrows show gradient vectors in the input plane (x-y space), pointing toward the direction of steepest ascent.
Key findings:
- Gradient vectors live in the input plane and point toward steepest ascent, which is why moving opposite to them in gradient descent moves toward lower loss values.
- Surface plots with overlaid gradient arrows show the relationship between function terrain and optimization direction more clearly than contour plots alone.
- The same gradient intuition from 2D visualizations generalizes directly to neural networks with millions of parameters, where each component is a partial derivative with respect to one weight.


### [Counting Cards with Computer Vision](http://philippdubach.com/posts/counting-cards-with-computer-vision/)
Published: 2025-07-06
Description: How I trained a YOLOv11 model to detect playing cards at 99.5% accuracy and built a real-time Monte Carlo blackjack odds calculator using computer vision.

Summary: After installing Claude Code
the agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster through natural language commands
I was looking for a task to test its abilities. Fairly quickly we wrote less than 200 lines of python code predicting blackjack odds using Monte Carlo simulation. When I went on to test this little tool on Washington Post's online blackjack (I also didn't know that existed!) I quickly noticed how impractical it was to manually input all the card values on the table. What if the tool could also handle blackjack card detection automatically and calculate the odds from it? I have never done anything with computer vision so this seemed like a good challenge. To get to any reasonable result we have to start with classification where we 'teach' the model to categorize data by showing them lots of examples with correct labels. But where do the labels come from? I manually annotated 409 playing cards across 117 images using Roboflow Annotate (at first I only did half as much - why this wasn't a good idea we'll see in a minute). Once enough screenshots of cards were annotated we can train the model to recognize the cards and predict card values on tables it has never seen before. I was able to use a NVIDIA T4 GPU inside Google Colab which offers some GPU time for free when capacity is available. During training, the algorithm learns patterns from this example data, adjusting its internal parameters millions of times until it gets really good at recognizing the differences between categories (in this case different cards). Once trained, the model can then make predictions on new, unseen data by applying the patterns it learned. With the annotated dataset ready, it was time to implement the actual computer vision model. I chose to run inference on Ultralytics' YOLOv11 pre-trained model, a leading object detection algorithm. I set up the environment in Google Colab following the 'How to Train YOLO11 Object Detection on a Custom Dataset' notebook. After extracting the annotated dataset from Roboflow, I began training the model using the pre-trained YOLOv11s weights as a starting point. This approach, called transfer learning, allows the model to reuse patterns already learned from millions of general images and adapt them to this specific task. I initially set it up to run for 350 epochs, though the model's built-in early stopping mechanism kicked in after 242 epochs when no improvement was observed for 100 consecutive epochs. The best results were achieved at epoch 142, taking around 13 minutes to complete on the Tesla T4 GPU. The initial results were quite promising, with an overall mean Average Precision (mAP) of 80.5% at IoU threshold 0.5. Most individual card classes achieved good precision and recall scores, with only a few cards like the 6 and Queen showing slightly lower precision values. However, looking at the confusion matrix and loss curves revealed some interesting patterns. While the model was learning effectively (as shown by the steadily decreasing loss), there were still some misclassifications between similar cards, particularly among the numbered cards. This highlighted exactly why I mentioned earlier that annotating only half the amount of data initially 'wasn't a good idea' - more training examples would likely improve these edge cases and reduce confusion between similar-looking cards. My first attempt at solving the remaining accuracy issues was to add another layer to the workflow by sending the detected cards to Anthropic's Claude API for additional OCR processing. This hybrid approach was very effective - the combination of YOLO's object detection to dynamically crop down the Black Jack table to individual cards with Claude's advanced vision capabilities yielded 99.9% accuracy on the predicted cards. However, this solution came with a significant drawback: the additional API layer consumed valuable time and the large model's processing overhead, making it impractical for real-time gameplay.
Key findings:
- YOLOv11 trained on 409 annotated playing cards achieved 99.5% mAP@50, up from 80.5% after doubling annotations and fixing bounding polygon errors.
- Local YOLO inference runs at 45.5ms per image, roughly 40x faster than Roboflow's hosted API at 4 seconds, making real-time card detection practical.
- Claude's Vision API achieved 99.9% accuracy on card recognition but was too slow for gameplay, while EasyOCR failed to detect roughly half the cards entirely.


### [Modeling Glycemic Response with XGBoost](http://philippdubach.com/posts/modeling-glycemic-response-with-xgboost/)
Published: 2025-05-30
Description: A hands-on project predicting postprandial glucose curves with XGBoost, Gaussian curve fitting, and 27 engineered features from CGM data. Code on GitHub.

Summary: 
Earlier this year I wrote how I built a CGM data reader after wearing a continuous glucose monitor myself. Since I was already logging my macronutrients and learning more about molecular biology in an MIT MOOC, I became curious: given a meal's macronutrients (carbs, protein, fat) and some basic individual characteristics (age, BMI), could a machine learning model predict the shape of my postprandial glucose curve? I came across Zeevi et al.'s paper on Personalized Nutrition by Prediction of Glycemic Responses, which used machine learning to predict individual glycemic responses from meal data. Exactly what I had in mind. Unfortunately, neither the data nor the code were publicly available. So I decided to build my own model. In the process I wrote this working paper.
Key findings:
- XGBoost with 27 engineered features predicted glucose spike amplitude at R-squared 0.46, nearly double the linear regression baseline of 0.24, but time-to-peak prediction was worse than guessing the average.
- Meal composition tells you something about how high your blood sugar will rise but almost nothing about when it peaks or how long it stays elevated.
- EPFL's Food and You study with 1,000+ participants achieved a correlation of 0.71 using the same XGBoost approach, confirming that data quantity matters more than model complexity.
- Adding gut microbiome data increased explained variance in glucose peaks from 34% to 42%, showing how much meal composition alone leaves on the table.


### [I Built a CGM Data Reader](http://philippdubach.com/posts/i-built-a-cgm-data-reader/)
Published: 2025-01-02
Description: Built a CGM data analysis tool with Python to visualize Freestyle Libre 3 glucose data alongside nutrition, workouts, and sleep for endurance cycling.

Summary:  If you're reading this, you might also be interested in: Modeling Glycemic Response with XGBoost
Last year I put a Continuous Glucose Monitor (CGM) sensor, specifically the Abbott Freestyle Libre 3, on my left arm. Why? I wanted to optimize my nutrition for endurance cycling competitions. Where I live, the sensor is easy to get—without any medical prescription—and even easier to use. Unfortunately, Abbott's FreeStyle LibreLink app is less than optimal (3,250 other people with an average rating of 2.9/5.0 seem to agree). In their defense, the web app LibreView does offer some nice reports which can be generated as PDFs—not very dynamic, but still something! What I had in mind was more in the fashion of the Ultrahuman M1 dashboard. Unfortunately, I wasn't allowed to use my Libre sensor (EU firmware) with their app (yes, I spoke to customer service).
Key findings:
- Abbott's LibreLink app has a 2.9/5.0 rating from 3,250 reviews, so I built a Python dashboard that merges Libre 3 glucose data with nutrition, workout, and sleep data from five sources
- The dashboard overlays workout traces, meal macros, and sleep phases onto continuous glucose readings, letting you spot correlations the stock app cannot surface
- Key metrics for endurance athletes include time-in-range, coefficient of variation, and pattern analysis around meals and training loads


### [The Tech behind this Site](http://philippdubach.com/posts/the-tech-behind-this-site/)
Published: 2024-01-15
Description: A Hugo blog tech stack with Cloudflare R2 image hosting, responsive WebP shortcodes, Workers AI social automation, and GitHub Pages CI/CD deployment.

Summary: This site runs on Hugo, deployed to GitHub Pages with Cloudflare CDN. Images are hosted on R2 (static.philippdubach.com) with automatic resizing and WebP conversion.
The core challenge was responsive images. Standard markdown ![alt](url) doesn't support multiple sizes. I built a Hugo shortcode that generates <picture> elements with breakpoint-specific sources—upload once at full quality, serve optimized versions (320px mobile to 1600px desktop) automatically.
Updates
March 2026
Hugo Upgrade — Upgraded from Hugo v0.128.0 to v0.157.0. Migrated deprecated .Site.AllPages to .Site.Pages in the sitemap template and .Site.Data to site.Data across navigation, structured data, and research templates. Removed a dead readFile security config key from hugo.toml. No breaking changes, zero deprecation warnings.
Key findings:
- The site runs on Hugo with Cloudflare R2 image hosting, serving responsive WebP images from 320px to 1600px via a custom shortcode that generates picture elements from a single upload
- Social media posts to Bluesky and Twitter are generated automatically by Cloudflare Workers running Llama 4 Scout 17B, with no manual intervention
- A security headers Worker on Cloudflare injects HSTS, CSP, and COEP headers because GitHub Pages does not process _headers files natively