Session 10: Two-Tower, Evaluation & Retrieval - Data Science Workflow & Applications

⚠️ Please note we are currently reworking the lecture and still preparing the course materials. You will find this message wherever the course materials are being updated / not finalized. ⚠️

Part 1 — Two-Tower Architecture¶

Two-Tower architecture: separate user and item encoders, dot-product scoring
Collaborative baseline (RecBole): Train DSSM on MovieLens ratings via RecBole — ratings-only, no multimodal
Multimodal two-tower (custom PyTorch): Image tower (pre-trained ResNet/ViT on TMDB posters) + text/interaction tower (user or plot embeddings), contrastive/dot-product loss to align both towers into a shared embedding space
Why not RecBole for multimodal? RecBole consumes pre-extracted features — cannot train encoders end-to-end
Extract learned embeddings → load into ChromaDB
Key insight: the vector DB doesn’t care where embeddings come from (text vs. collaborative vs. multimodal) — it just indexes and retrieves

Part 2 — Evaluation & Method Comparison¶

Metrics: Precision@k, Recall@k
Compare all methods on the same test set: content-based (L7) vs. SVD/AutoRec (L8) vs. neural embeddings (L9) vs. collaborative two-tower (RecBole) vs. multimodal two-tower (custom PyTorch)
Cold-start evaluation: held-out items — which method still works?

Cluster demo¶

kubectl, GPU access, batch embedding on MovieLens 25M

Practice: Mini final project — embed dataset, generate recommendations with 2+ methods, compare P@k/R@k, interpret results