Recommend Arena.

Twelve recommendation-system designs benchmarked head-to-head on the same dataset and ground-truth queries.

Python/ChromaDB/NetworkX/Claude Code

What it is

Twelve recommender architectures — knowledge graph, pure embedding, LLM-as-judge, hybrid, TF-IDF, Bayesian, and more — implemented against a single shared protocol and evaluated on the same dataset of sports gear. Same queries in, same ground truth out, NDCG and MRR decide.

Why I made it

I struggle with decision anxiety — mostly around things like picking skis or running shoes. The goal was to find an architecture that could recommend things well from reviews, whether the user's input is vague ("something for beginners") or detailed ("daily trainer, neutral, wide toe box"). I had Claude prototype multiple paths and benchmark them against each other. The fine-tuned embedding space ended up as my championed approach.