Hanger

Archived

AI/ML

AI fashion discovery engine - 2K+ users, +40% CTR, 22% relevance gain vs $50M+ competitor via CLIP fine-tuning + hybrid retrieval.

2K+Users

+40%CTR Gain

+22%Relevance ↑

15%Latency ↓

$500/moInfra Cost

Chat Agent Deep Dive

Problem

Fashion shoppers search by vibe ('winter formal but casual streetwear'), not keywords. Traditional systems fail: • Keyword matching misses semantic intent • Collaborative filtering can't handle cold-start • Daydream spent $50M+ but still couldn't replicate this

Solution

Three technical optimizations beat their infrastructure: 1. Better Representation • Fine-tuned CLIP on fashion data • Multi-vector embeddings per product • Attribute-aware layers (materials, seasonality, fit, occasion) 2. Hybrid Retrieval • PostgreSQL → structured filters (price, size, inventory) • Pinecone → semantic search (HNSW indexing) • Merge + rerank for best results 3. Context-Aware Reasoning • FastAPI agents reason across dimensions • Weather vs materials, occasion vs dress code • Budget + inventory constraints • Handles queries like 'cold NYC rooftop party under $200'

Impact

Outcompeted $50M+ competitor with $500/month infrastructure: • 2K+ users • +40% CTR improvement • +28% conversion increase • +22% relevance vs baseline • 15% faster with sub-second latency • Production-ready in 3 months

Why I Built This

Frustrated by endless scrolling through keyword search that didn't understand style intent. Built Hanger to match how people actually think about fashion, by vibe, context, and aesthetic, not just product attributes. Wanted to prove you could beat well-funded competitors through smarter technical choices, not just scale.

Architecture

CLIP (Fine-tuned)

FastAPI

PostgreSQL

Pinecone

Redis

Docker

Vercel Edge

Technical Highlights

Fine-tuned CLIP embeddings on fashion-specific data with multi-vector representations per product (materials, seasonality, fit, occasion) → 22% relevance improvement over baseline

Built hybrid retrieval pipeline: PostgreSQL structured filters (price, size, inventory) + Pinecone semantic search (HNSW indexing) + merge/rerank → 15% latency reduction, sub-second recommendations

Designed context-aware reasoning layer with FastAPI agents: reasons across weather vs materials, color palettes vs seasonality, occasion vs dress code, budget constraints → handles queries like 'cold NYC rooftop party under $200'

Optimized for lean production: cached embeddings, query batching, reduced vector dimensions with minimal accuracy loss → $500/month infrastructure vs Daydream's $50M+ spend

Deployed on Vercel Edge with Redis caching - handling 2K+ users with <100ms p50 latency across 1M+ product catalog

How We Built It

Ingestion + Catalog Pipeline

We built Hanger's product catalog as a daily-refresh system that could ingest thousands of SKUs across retailers and keep availability + pricing current.

Retailer Scraping with Puppeteer

Used Puppeteer to crawl retailer category pages and product pages with normalized field extraction and per-retailer parsing modules so one site change didn't break the entire pipeline.

Extracted normalized fields: title, brand, price, sale_price, currency, images, sizes, color, material, category, product_url, retailer, SKU/variant IDs
Anti-breakage patterns: retry logic, exponential backoff, selector fallbacks
Per-retailer parsing modules isolated failures to individual crawlers

Scraping at Scale

Distributed scrape jobs batched by retailer/category with controlled concurrency to avoid rate limits
Stored raw HTML snapshots + parsed payloads for debugging and diffing when retailers changed page structure
Predictable run durations with worker pool limiting and per-domain throttles

Admin Interface + Cron-Controlled Refresh

Built an admin dashboard for full operational control over the ingestion pipeline.

Enable/disable retailers, categories, or individual crawlers on the fly
Set scrape frequency per crawler (e.g., 1×/day or more for fast-moving inventory)
Trigger manual re-runs and view job health: success %, failures, last run, duration
Upserts on each run; missing items marked 'inactive' instead of hard-deleted for churn tracking + recovery

Embeddings + Vector Indexing (CLIP + Pinecone)

For every product we precomputed embeddings at ingestion time so nothing ran on the user query path.

Generated CLIP image embedding + text embedding per product at ingestion
Stored in Pinecone: vector = CLIP embedding, metadata = retailer, price range, category, size availability, gender, color
Precomputing kept search fast and latency predictable - no embedding generation at query time

Search Agent

Goal: 'show me items like this' and 'find me a [style] outfit' - fast, relevant, and filterable.

Query → Retrieval → Rerank

Parse intent + constraints from user query (e.g. 'black mini dress under $120' → color=black, category=dress, price<120)
Retrieve candidates from Pinecone via semantic vector search (CLIP text embedding of query, or image embedding for inspo images) + hard metadata filters (price, retailer, category, in-stock sizes)
Re-rank with blended score: vector similarity distance + inventory confidence (in-stock at last refresh) + preference boosts (brands saved, liked styles)
Return paginated results with stable sorting so items don't shuffle between pages

Recommendation Engine

A personalization loop built on implicit + explicit signals, continuously updating a per-user preference profile.

Signals

Implicit: clicks, dwell time, add-to-collection, 'more like this'
Explicit: likes/dislikes, brands to follow/avoid, price comfort range, preferred categories

Recommendation Flow

Maintain a user preference profile: embedding centroid from liked items, negative centroid from disliked items, structured constraints (price, categories)
Periodically query Pinecone with the user's preference embedding + apply metadata filters
Diversify results using clustering + similarity thresholds to avoid 20 near-identical black tops
Delivered as: 'For You' feed, 'Because you liked X', 'New in your style' (items scraped in last 24h prioritized)

Frontend Architecture

A React search + feed UI built for stability on a large, constantly-changing catalog.

React + Server-Driven Pagination

Consistent product card component + skeleton loaders across all views
Cursor-based pagination preferred over offset - prevents duplicates when inventory changes mid-scroll
API returns items + nextCursor; frontend requests next page on scroll or 'Load more'
Cached results per query so back/forward navigation is instant
Debounced query input to avoid firing on every keystroke
Optimistic UI for likes/saves with rollback on failure
Search filters wired directly to backend metadata filters - no expensive client-side re-processing

Why This Mattered

The catalog is large and constantly changing - stock and pricing update daily. Stable pagination + caching + consistent ranking prevented duplicated items across pages, missing items when new inventory ingests mid-scroll, and jarring reshuffles when toggling filters.

Technologies & Tags

Vector Search

ML/Agents

Product