# Working with Models

**Author:** Philipp D. Dubach | **Published:** November 8, 2025 | **Updated:** February 22, 2026
**Categories:** AI
**Keywords:** diffusion models explained, Stanford CS236 course, deep generative models tutorial, diffusion model forward process, Stefano Ermon lectures, generative AI architecture, AI model training foundations

## Key Takeaways

- Diffusion models work by gradually corrupting data into noise through a forward process, then learning to reverse that process to generate new samples.
- Stanford CS236 by Stefano Ermon covers the full mathematical foundations of deep generative models, from VAEs and GANs to score-based diffusion, and is freely available on YouTube.
- The shared mathematical ideas underlying diverse diffusion formulations trace back to linking data distributions to simple priors through a continuum of intermediate distributions.

---

There was this "[I work with Models](https://us1.discourse-cdn.com/flex001/uploads/ultralytics1/original/1X/45c604467b6f4212858281cf28f71a77083fb45e.jpeg)" joke which I first heard years ago from an analyst working on a valuation model ([see my previous post](/posts/everything-is-a-dcf-model/)). I guess it has become more relevant than ever:

>This monograph presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse formulations arise from shared mathematical ideas. Diffusion modeling starts by defining a forward process that gradually corrupts data into noise, linking the data distribution to a simple prior through a continuum of intermediate distributions.

If you want to get into this topic in the first place, be sure to check out [Stefano Ermon's CS236 Deep Generative Models Course](https://deepgenerativemodels.github.io). Lecture recordings of the full course can also be found on [YouTube](https://www.youtube.com/playlist?list=PLoROMvodv4rPOWA-omMM6STXaWW4FvJT8).


---

## Frequently Asked Questions

### What is the forward process in diffusion models?

The forward process gradually corrupts data into noise, linking the data distribution to a simple prior through a continuum of intermediate distributions. Small Gaussian noise is incrementally added over many timesteps until the original data becomes indistinguishable from pure noise.

### Where can I learn diffusion models for free?

Stefano Ermon's Stanford CS236 Deep Generative Models course covers diffusion models with full mathematical foundations. The course website has materials at deepgenerativemodels.github.io, and complete lecture recordings are available on YouTube.

### What does Stanford CS236 cover?

CS236 covers the probabilistic foundations and learning algorithms for deep generative models, including variational autoencoders, generative adversarial networks, autoregressive models, normalizing flows, energy-based models, and score-based diffusion models.


---

*Philipp D. Dubach — [http://philippdubach.com/](http://philippdubach.com/) — 2025*