HackerBook Static Analysis

I’ve been working on an empirical study of attention dynamics on Hacker News; analyzing decay curves, preferential attachment, survival, and early-engagement prediction using ~300k items and 72k temporal snapshots collected in December 2025.

Overview of the SSRN preprint Overview of the SSRN preprint

Preprint available on SSRN

Today I saw this Show HN: 22GB of Hacker News in SQLite, served via WASM shards. Downloaded the HackerBook export and ran a subset of my paper’s analytics on it.

Caveat: HackerBook is a single static snapshot (no time-series data). Therefore I could not analyze lifecycle analysis, early-velocity prediction, or decay fitting. What can be computed: distributional statistics, inequality metrics, circadian patterns.

Summary statistics Summary statistics

Score distribution (CCDF + power-law fit) Score distribution (CCDF) with power-law fit on HackerBook shard sample Score distribution (CCDF) with power-law fit on HackerBook shard sample

Attention inequality (Lorenz curve + Gini) Lorenz curve of story scores (attention inequality) with sample Gini Lorenz curve of story scores (attention inequality) with sample Gini

Circadian patterns (volume vs mean score, UTC) Circadian patterns on Hacker News (UTC): posting volume vs mean score Circadian patterns on Hacker News (UTC): posting volume vs mean score

Score vs direct comments (proxy) Score vs direct comments (proxy from reply edges), log-log scatter Score vs direct comments (proxy from reply edges), log-log scatter

Direct comments distribution (CCDF, proxy) Direct comments distribution (proxy) shown as CCDF Direct comments distribution (proxy) shown as CCDF

Mean score vs direct comments (binned, proxy) Mean score vs direct comments (proxy), binned in log-spaced buckets Mean score vs direct comments (proxy), binned in log-spaced buckets

Sources: Paper (SSRN) · hn-archiver repo · HackerBook repo