Hanger
Archived
AI/ML

AI fashion discovery engine—2K+ users, +40% CTR, 22% relevance gain vs $50M+ competitor via CLIP fine-tuning + hybrid retrieval.

2K+Users
+40%CTR Gain
+22%Relevance ↑
15%Latency ↓
$500/moInfra Cost
Hanger screenshot 1
Hanger screenshot 2

Problem

Fashion shoppers search by vibe ('winter formal but casual streetwear'), not keywords. Traditional systems fail: • Keyword matching misses semantic intent • Collaborative filtering can't handle cold-start • Daydream spent $50M+ but still couldn't solve this

Solution

Three technical optimizations beat their infrastructure: 1. Better Representation • Fine-tuned CLIP on fashion data • Multi-vector embeddings per product • Attribute-aware layers (materials, seasonality, fit, occasion) 2. Hybrid Retrieval • PostgreSQL → structured filters (price, size, inventory) • Pinecone → semantic search (HNSW indexing) • Merge + rerank for best results 3. Context-Aware Reasoning • FastAPI agents reason across dimensions • Weather vs materials, occasion vs dress code • Budget + inventory constraints • Handles queries like 'cold NYC rooftop party under $200'

Impact

Outcompeted $50M+ competitor with $500/month infrastructure: • 2K+ users • +40% CTR improvement • +28% conversion increase • +22% relevance vs baseline • 15% faster with sub-second latency • Production-ready in 3 months

Why I Built This

Frustrated by endless scrolling through keyword search that didn't understand style intent. Built Hanger to match how people actually think about fashion, by vibe, context, and aesthetic, not just product attributes. Wanted to prove you could beat well-funded competitors through smarter technical choices, not just scale.

Architecture

Architecture
CLIP (Fine-tuned)
FastAPI
PostgreSQL
Pinecone
Redis
Docker
Vercel Edge

Technical Highlights

1

Fine-tuned CLIP embeddings on fashion-specific data with multi-vector representations per product (materials, seasonality, fit, occasion) → 22% relevance improvement over baseline

2

Built hybrid retrieval pipeline: PostgreSQL structured filters (price, size, inventory) + Pinecone semantic search (HNSW indexing) + merge/rerank → 15% latency reduction, sub-second recommendations

3

Designed context-aware reasoning layer with FastAPI agents: reasons across weather vs materials, color palettes vs seasonality, occasion vs dress code, budget constraints → handles queries like 'cold NYC rooftop party under $200'

4

Optimized for lean production: cached embeddings, query batching, reduced vector dimensions with minimal accuracy loss → $500/month infrastructure vs Daydream's $50M+ spend

5

Deployed on Vercel Edge with Redis caching—handling 2K+ users with <100ms p50 latency across 1M+ product catalog

How We Built It

1

Ingestion + Catalog Pipeline

We built Hanger's product catalog as a daily-refresh system that could ingest thousands of SKUs across retailers and keep availability + pricing current.

Retailer Scraping with Puppeteer

Used Puppeteer to crawl retailer category pages and product pages with normalized field extraction and per-retailer parsing modules so one site change didn't break the entire pipeline.

  • Extracted normalized fields: title, brand, price, sale_price, currency, images, sizes, color, material, category, product_url, retailer, SKU/variant IDs
  • Anti-breakage patterns: retry logic, exponential backoff, selector fallbacks
  • Per-retailer parsing modules isolated failures to individual crawlers

Scraping at Scale

  • Distributed scrape jobs batched by retailer/category with controlled concurrency to avoid rate limits
  • Stored raw HTML snapshots + parsed payloads for debugging and diffing when retailers changed page structure
  • Predictable run durations with worker pool limiting and per-domain throttles

Admin Interface + Cron-Controlled Refresh

Built an admin dashboard for full operational control over the ingestion pipeline.

  • Enable/disable retailers, categories, or individual crawlers on the fly
  • Set scrape frequency per crawler (e.g., 1×/day or more for fast-moving inventory)
  • Trigger manual re-runs and view job health: success %, failures, last run, duration
  • Upserts on each run; missing items marked 'inactive' instead of hard-deleted for churn tracking + recovery

Embeddings + Vector Indexing (CLIP + Pinecone)

For every product we precomputed embeddings at ingestion time so nothing ran on the user query path.

  • Generated CLIP image embedding + text embedding per product at ingestion
  • Stored in Pinecone: vector = CLIP embedding, metadata = retailer, price range, category, size availability, gender, color
  • Precomputing kept search fast and latency predictable — no embedding generation at query time
2

Search Agent

Goal: 'show me items like this' and 'find me a [style] outfit' — fast, relevant, and filterable.

Query → Retrieval → Rerank

  • Parse intent + constraints from user query (e.g. 'black mini dress under $120' → color=black, category=dress, price<120)
  • Retrieve candidates from Pinecone via semantic vector search (CLIP text embedding of query, or image embedding for inspo images) + hard metadata filters (price, retailer, category, in-stock sizes)
  • Re-rank with blended score: vector similarity distance + inventory confidence (in-stock at last refresh) + preference boosts (brands saved, liked styles)
  • Return paginated results with stable sorting so items don't shuffle between pages
3

Recommendation Engine

A personalization loop built on implicit + explicit signals, continuously updating a per-user preference profile.

Signals

  • Implicit: clicks, dwell time, add-to-collection, 'more like this'
  • Explicit: likes/dislikes, brands to follow/avoid, price comfort range, preferred categories

Recommendation Flow

  • Maintain a user preference profile: embedding centroid from liked items, negative centroid from disliked items, structured constraints (price, categories)
  • Periodically query Pinecone with the user's preference embedding + apply metadata filters
  • Diversify results using clustering + similarity thresholds to avoid 20 near-identical black tops
  • Delivered as: 'For You' feed, 'Because you liked X', 'New in your style' (items scraped in last 24h prioritized)
4

Frontend Architecture

A React search + feed UI built for stability on a large, constantly-changing catalog.

React + Server-Driven Pagination

  • Consistent product card component + skeleton loaders across all views
  • Cursor-based pagination preferred over offset — prevents duplicates when inventory changes mid-scroll
  • API returns items + nextCursor; frontend requests next page on scroll or 'Load more'
  • Cached results per query so back/forward navigation is instant
  • Debounced query input to avoid firing on every keystroke
  • Optimistic UI for likes/saves with rollback on failure
  • Search filters wired directly to backend metadata filters — no expensive client-side re-processing

Why This Mattered

The catalog is large and constantly changing — stock and pricing update daily. Stable pagination + caching + consistent ranking prevented duplicated items across pages, missing items when new inventory arrived mid-scroll, and jarring reshuffles when toggling filters.

Technologies & Tags

Vector Search
ML/Agents
Product