The frontend is vanilla JavaScript—no React, no build steps, just DOM manipulation and CSS transitions. You drag a card, it follows your finger, and snaps away with a satisfying animation. Behind the scenes, the app tracks everything: votes (like/neutral/dislike), time spent viewing each card, and whether you actually opened the link. If I swipe right but don’t click through, that’s a signal. If I spend 0.3 seconds on a card before swiping left, that’s a signal too.name,url pairs, click refresh, and the fetcher pulls articles with proper HTTP caching (ETag/Last-Modified) to avoid hammering servers. You can use your own feed list or load a predefined list. Thanks to Manuel Moreale who created blogroll I was able to get an OPML export and load all curated RSS feeds directly. Something similar works with minifeed or Kagi’s smallweb. Or you use one of the Hacker News RSS feeds. If that feels too adventurous, I created curated feeds for the most popular HN bloggers.
Building the model, I started with XGBoost and some hand-engineered features (title length, word count, time of day, feed source). Decent—around 66% ROC-AUC. It learned that I dislike short, clickbaity titles. But it didn’t understand context.
The upgrade was MPNet (all-mpnet-base-v2 from sentence-transformers) to generate 768-dimensional embeddings for every article’s title and description. Combined with engineered features—feed preferences, temporal patterns, text statistics—this gets fed into a Hybrid Random Forest.
def predict_preference(article):
# Generate semantic embeddings (768 dims)
embeddings = mpnet.encode(f"{article.title} {article.description}")
# Extract behavioral + text features
features = feature_pipeline.transform(article)
# Predict with Hybrid RF
X = np.hstack([embeddings, features])
return model.predict_proba(X)Training happens on Google Colab (free T4 GPU or even faster with H100 or A100 on a subscription). Upload your training CSV, run the notebook, download a .pkl file.
The problem with transformer models is latency. Generating MPNet embeddings takes ~1 second per article. In a swipe interface, that lag is unbearable. The next best thing is a preload queue. While you’re reading the current card, the backend is scoring and fetching the next 3-5 articles in the background. By the time you swipe, the next card is already waiting.
async loadNextBatch() {
const excludeIds = this.cardQueue.map(c => c.id).join(',');
const response = await fetch(`/api/posts/batch?count=3&exclude=${excludeIds}`);
const data = await response.json();
this.cardQueue.push(...data.posts);
}Article selection uses Thompson Sampling: 80% of the time it shows what the model thinks you’ll like (exploit), 20% it throws in something unexpected (explore). This prevents the filter bubble problem and lets the model discover if your tastes have changed.
The whole system is designed as a closed loop:
- Swipe → votes get stored in SQLite
- Export → download training CSV with votes + engagement data
- Train → run Colab notebook, get new model
- Upload → drag-drop the
.pklfile back into the app
Uploaded models show their ROC-AUC score so you can compare performance across training runs. Activate whichever one works best.
Backend: Python, Flask, SQLite Frontend: Vanilla JS, CSS variables ML: scikit-learn, XGBoost, sentence-transformers (MPNet) Training: Google Colab (free GPU tier)
Total infrastructure cost: zero. Everything runs locally. No accounts, no cloud dependencies, no tracking.
git clone https://github.com/philippdubach/rss-swipr.git
cd rss-swipr
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python app.pyThe full source and Colab notebook are available on GitHub.