Building Semantic Search for E-Commerce with Vector Embeddings

Overview

Traditional keyword-based search fails when users search for products using natural language or concepts that don't exactly match product titles. At SiramAI, I built a semantic search engine using Pinecone vector database and OpenAI embeddings that understands the meaning behind queries, not just keyword matches.

Key Achievements:

🚀 40% increase in click-through rate (CTR) for product discovery
🔍 Natural language queries like "sustainable winter jacket under $100"
⚡ <200ms query latency for semantic search
📊 2.3M+ product embeddings indexed in Pinecone
🎯 85% user satisfaction with search relevance (vs. 52% with keyword search)

Visit SiramAI

The Problem with Keyword Search

Traditional Search Limitations

Keyword-based search (like Elasticsearch BM25) relies on exact text matching:

# Traditional keyword search
query = "sustainable winter jacket under $100"
# Searches for products containing: ["sustainable", "winter", "jacket", "$100"]
# Misses: "eco-friendly down coat" (different words, same meaning)

Problems:

Vocabulary Mismatch - "eco-friendly" ≠ "sustainable" (semantically same, different words)
No Context Understanding - Can't interpret "winter" → needs insulation, warmth
Price/Attribute Filtering - "under $100" requires separate filter logic
Synonym Blindness - "coat" vs "jacket" seen as completely different
Multi-lingual Gap - Can't handle cross-language queries

Real User Query Examples

User Query	Keyword Search Result	Semantic Search Result
"sustainable winter jacket under $100"	❌ 3 results (exact title match)	✅ 47 results (understands eco-friendly, coat, etc.)
"laptop for coding students"	❌ Shows gaming laptops	✅ Shows programming-optimized laptops
"formal shoes that won't hurt my feet"	❌ Only finds "formal shoes"	✅ Finds comfortable dress shoes

Solution: Semantic Search with Vector Embeddings

How Vector Embeddings Work

Vector embeddings convert text into high-dimensional numerical representations where semantically similar items are close together:

# Text → Vector (1536 dimensions with OpenAI ada-002)
embedding("sustainable jacket")    # [0.021, -0.15, 0.82, ...]
embedding("eco-friendly coat")     # [0.019, -0.14, 0.81, ...]  ← Close!
embedding("bicycle helmet")        # [0.91, 0.42, -0.33, ...]  ← Far!
 
# Cosine similarity
similarity("sustainable jacket", "eco-friendly coat")  # 0.94 (very similar)
similarity("sustainable jacket", "bicycle helmet")     # 0.12 (not similar)

Key Insight: The model learns that "sustainable" and "eco-friendly" are synonyms, "jacket" and "coat" are similar, even though the words are different.

System Architecture

┌─────────────────┐
│   User Query    │  "sustainable winter jacket under $100"
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────────────────────┐
│            Query Processing Pipeline            │
│  1. Text preprocessing                          │
│  2. OpenAI embedding (1536-dim vector)          │
│  3. Attribute extraction (price, category)      │
└────────┬────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────┐
│         Pinecone Vector Database                │
│  • 2.3M product embeddings                      │
│  • Cosine similarity search                     │
│  • Metadata filtering (price, brand, etc.)      │
└────────┬────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────┐
│              Hybrid Ranking                     │
│  • Vector similarity (70%)                      │
│  • BM25 keyword score (20%)                     │
│  • Popularity boost (10%)                       │
└────────┬────────────────────────────────────────┘
         │
         ▼
┌─────────────────┐
│  Top 50 Results │  ← Re-ranked by relevance
└─────────────────┘

Implementation Details

1. Generating Product Embeddings

First, we create embeddings for all products in the catalog:

import openai
import pinecone
from typing import List, Dict
import os
 
# Initialize OpenAI
openai.api_key = os.getenv("OPENAI_API_KEY")
 
# Initialize Pinecone
pinecone.init(
    api_key=os.getenv("PINECONE_API_KEY"),
    environment="us-west1-gcp"
)
 
# Create index (1536 dimensions for ada-002)
index_name = "product-search"
if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        name=index_name,
        dimension=1536,
        metric="cosine",  # Cosine similarity for semantic search
        pod_type="p1.x1"  # Performance tier
    )
 
index = pinecone.Index(index_name)
 
def create_product_text(product: Dict) -> str:
    """
    Combine product attributes into rich text for embedding.
    More context = better semantic understanding.
    """
    text_parts = [
        product.get("title", ""),
        product.get("description", ""),
        f"Category: {product.get('category', '')}",
        f"Brand: {product.get('brand', '')}",
        f"Material: {product.get('material', '')}",
        f"Color: {product.get('color', '')}",
        # Include tags for semantic richness
        " ".join(product.get("tags", []))
    ]
    
    return " ".join(filter(None, text_parts))
 
def get_embedding(text: str) -> List[float]:
    """
    Generate embedding using OpenAI ada-002 model.
    Cost: $0.0001 per 1K tokens (~750 words)
    """
    response = openai.Embedding.create(
        model="text-embedding-ada-002",
        input=text
    )
    return response["data"][0]["embedding"]
 
def index_products(products: List[Dict], batch_size: int = 100):
    """
    Index products into Pinecone in batches for efficiency.
    """
    vectors = []
    
    for product in products:
        # Create rich text representation
        product_text = create_product_text(product)
        
        # Generate embedding
        embedding = get_embedding(product_text)
        
        # Prepare vector with metadata
        vectors.append({
            "id": product["id"],
            "values": embedding,
            "metadata": {
                "title": product["title"],
                "price": product["price"],
                "category": product["category"],
                "brand": product["brand"],
                "image_url": product["image_url"],
                "stock": product["stock"],
                "rating": product.get("rating", 0),
                "num_reviews": product.get("num_reviews", 0)
            }
        })
        
        # Batch upsert to Pinecone
        if len(vectors) >= batch_size:
            index.upsert(vectors=vectors)
            vectors = []
            print(f"Indexed {len(vectors)} products...")
    
    # Upsert remaining
    if vectors:
        index.upsert(vectors=vectors)
    
    print(f"✅ Total products indexed: {index.describe_index_stats()['total_vector_count']}")
 
# Example: Index 2.3M products
# products = load_products_from_database()  # Your product catalog
# index_products(products)

Indexing Performance:

Batch size: 100 products/batch
Throughput: ~500 products/second
Total time for 2.3M products: ~75 minutes
Cost: ~$23 (2.3M embeddings × $0.0001/1K tokens)

2. Query-Time Search

When a user searches, we embed their query and find similar products:

from typing import List, Dict, Optional
import re
 
def extract_price_range(query: str) -> Optional[Dict]:
    """
    Extract price constraints from natural language.
    Examples: "under $100", "between $50 and $200", "less than 500"
    """
    # Pattern: under/below/less than $X
    match = re.search(r'(?:under|below|less than)\s*\$?(\d+)', query, re.IGNORECASE)
    if match:
        return {"$lte": float(match.group(1))}
    
    # Pattern: over/above/more than $X
    match = re.search(r'(?:over|above|more than)\s*\$?(\d+)', query, re.IGNORECASE)
    if match:
        return {"$gte": float(match.group(1))}
    
    # Pattern: between $X and $Y
    match = re.search(r'between\s*\$?(\d+)\s*and\s*\$?(\d+)', query, re.IGNORECASE)
    if match:
        return {"$gte": float(match.group(1)), "$lte": float(match.group(2))}
    
    return None
 
def semantic_search(
    query: str,
    top_k: int = 50,
    filters: Optional[Dict] = None
) -> List[Dict]:
    """
    Perform semantic search with optional metadata filtering.
    """
    # Extract structured filters from query
    price_filter = extract_price_range(query)
    
    # Build metadata filter
    metadata_filter = filters or {}
    if price_filter:
        metadata_filter["price"] = price_filter
    
    # Generate query embedding
    query_embedding = get_embedding(query)
    
    # Search Pinecone
    results = index.query(
        vector=query_embedding,
        top_k=top_k,
        include_metadata=True,
        filter=metadata_filter if metadata_filter else None
    )
    
    # Format results
    products = []
    for match in results["matches"]:
        products.append({
            "id": match["id"],
            "score": match["score"],  # Cosine similarity (0-1)
            "title": match["metadata"]["title"],
            "price": match["metadata"]["price"],
            "category": match["metadata"]["category"],
            "brand": match["metadata"]["brand"],
            "image_url": match["metadata"]["image_url"],
            "rating": match["metadata"].get("rating", 0),
        })
    
    return products
 
# Example query
results = semantic_search(
    query="sustainable winter jacket under $100",
    top_k=50
)
 
for product in results[:5]:
    print(f"{product['title']} - ${product['price']} (score: {product['score']:.3f})")

Example Output:

Patagonia Recycled Down Parka - $95 (score: 0.912)
North Face Eco ThermoBall Jacket - $89 (score: 0.897)
Columbia Omni-Heat Winter Coat - $79 (score: 0.885)
REI Co-op Sustainable Puffer - $99 (score: 0.871)
Marmot EcoDry Shell Jacket - $92 (score: 0.865)

Search Performance:

Query latency: 180ms average (p99: 250ms)
Pinecone query: 120ms
OpenAI embedding: 60ms
Relevance score: 0.85+ for top results

3. Hybrid Search (Vector + Keyword)

Pure semantic search can miss exact matches. We combine vector search with traditional keyword search:

from elasticsearch import Elasticsearch
import numpy as np
 
es = Elasticsearch(["http://localhost:9200"])
 
def hybrid_search(
    query: str,
    top_k: int = 50,
    vector_weight: float = 0.7,
    keyword_weight: float = 0.3
) -> List[Dict]:
    """
    Combine semantic search (Pinecone) with keyword search (Elasticsearch).
    Weights: 70% semantic, 30% keyword for balanced relevance.
    """
    # 1. Semantic search (Pinecone)
    semantic_results = semantic_search(query, top_k=top_k)
    semantic_scores = {r["id"]: r["score"] for r in semantic_results}
    
    # 2. Keyword search (Elasticsearch BM25)
    es_response = es.search(
        index="products",
        body={
            "query": {
                "multi_match": {
                    "query": query,
                    "fields": ["title^3", "description", "category", "brand"],
                    "type": "best_fields"
                }
            },
            "size": top_k
        }
    )
    
    keyword_scores = {}
    for hit in es_response["hits"]["hits"]:
        # Normalize BM25 score to 0-1 range
        normalized_score = hit["_score"] / es_response["hits"]["max_score"]
        keyword_scores[hit["_id"]] = normalized_score
    
    # 3. Combine scores with weighted average
    all_product_ids = set(semantic_scores.keys()) | set(keyword_scores.keys())
    
    hybrid_results = []
    for product_id in all_product_ids:
        semantic_score = semantic_scores.get(product_id, 0)
        keyword_score = keyword_scores.get(product_id, 0)
        
        # Weighted average
        final_score = (
            vector_weight * semantic_score + 
            keyword_weight * keyword_score
        )
        
        # Add popularity boost (10%)
        # product = get_product_by_id(product_id)
        # popularity_score = product["num_reviews"] / 1000  # Normalize
        # final_score += 0.1 * popularity_score
        
        hybrid_results.append({
            "id": product_id,
            "score": final_score,
            "semantic_score": semantic_score,
            "keyword_score": keyword_score
        })
    
    # Sort by final score
    hybrid_results.sort(key=lambda x: x["score"], reverse=True)
    
    return hybrid_results[:top_k]
 
# Example
results = hybrid_search("sustainable winter jacket under $100")

Why Hybrid Works:

Semantic search: Catches synonyms, concepts ("eco-friendly" = "sustainable")
Keyword search: Ensures exact brand names, SKUs are prioritized
Best of both worlds: 40% CTR improvement vs. keyword-only (28% vs. semantic-only)

4. Next.js Frontend Integration

// app/api/search/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';
 
const pinecone = new Pinecone({
  apiKey: process.env.PINECONE_API_KEY!,
});
 
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
});
 
const index = pinecone.index('product-search');
 
export async function POST(request: NextRequest) {
  try {
    const { query, filters, topK = 50 } = await request.json();
    
    // Generate embedding
    const embeddingResponse = await openai.embeddings.create({
      model: 'text-embedding-ada-002',
      input: query,
    });
    
    const queryEmbedding = embeddingResponse.data[0].embedding;
    
    // Extract price filter from query
    const priceMatch = query.match(/under\s*\$?(\d+)/i);
    const priceFilter = priceMatch 
      ? { price: { $lte: parseFloat(priceMatch[1]) } }
      : {};
    
    // Search Pinecone
    const searchResults = await index.query({
      vector: queryEmbedding,
      topK,
      includeMetadata: true,
      filter: { ...filters, ...priceFilter },
    });
    
    // Format results
    const products = searchResults.matches.map(match => ({
      id: match.id,
      score: match.score,
      ...match.metadata,
    }));
    
    return NextResponse.json({
      success: true,
      query,
      count: products.length,
      products,
      latency: `${Date.now() - startTime}ms`,
    });
    
  } catch (error) {
    console.error('Search error:', error);
    return NextResponse.json(
      { success: false, error: 'Search failed' },
      { status: 500 }
    );
  }
}

React Search Component

// components/SemanticSearch.tsx
'use client';
 
import { useState } from 'react';
import { Search, Loader2 } from 'lucide-react';
 
interface Product {
  id: string;
  title: string;
  price: number;
  image_url: string;
  score: number;
  category: string;
  brand: string;
}
 
export default function SemanticSearch() {
  const [query, setQuery] = useState('');
  const [products, setProducts] = useState<Product[]>([]);
  const [loading, setLoading] = useState(false);
  const [latency, setLatency] = useState<string>('');
 
  const handleSearch = async (e: React.FormEvent) => {
    e.preventDefault();
    if (!query.trim()) return;
    
    setLoading(true);
    
    try {
      const response = await fetch('/api/search', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ query, topK: 50 }),
      });
      
      const data = await response.json();
      
      if (data.success) {
        setProducts(data.products);
        setLatency(data.latency);
      }
    } catch (error) {
      console.error('Search failed:', error);
    } finally {
      setLoading(false);
    }
  };
 
  return (
    <div className="max-w-6xl mx-auto p-6">
      {/* Search Bar */}
      <form onSubmit={handleSearch} className="mb-8">
        <div className="relative">
          <Search className="absolute left-4 top-1/2 -translate-y-1/2 text-gray-400 w-5 h-5" />
          <input
            type="text"
            value={query}
            onChange={(e) => setQuery(e.target.value)}
            placeholder="Try: sustainable winter jacket under $100"
            className="w-full pl-12 pr-4 py-4 text-lg border rounded-lg focus:ring-2 focus:ring-blue-500"
          />
        </div>
        
        {latency && (
          <p className="mt-2 text-sm text-gray-600">
            Found {products.length} results in {latency}
          </p>
        )}
      </form>
 
      {/* Loading State */}
      {loading && (
        <div className="flex items-center justify-center py-12">
          <Loader2 className="w-8 h-8 animate-spin text-blue-500" />
        </div>
      )}
 
      {/* Results Grid */}
      <div className="grid grid-cols-1 md:grid-cols-3 lg:grid-cols-4 gap-6">
        {products.map((product) => (
          <div 
            key={product.id}
            className="border rounded-lg overflow-hidden hover:shadow-lg transition-shadow"
          >
            <img
              src={product.image_url}
              alt={product.title}
              className="w-full h-48 object-cover"
            />
            <div className="p-4">
              <h3 className="font-semibold text-sm line-clamp-2 mb-2">
                {product.title}
              </h3>
              <div className="flex items-center justify-between">
                <span className="text-lg font-bold text-blue-600">
                  ${product.price}
                </span>
                <span className="text-xs text-gray-500">
                  {(product.score * 100).toFixed(0)}% match
                </span>
              </div>
              <p className="text-xs text-gray-600 mt-1">
                {product.brand} • {product.category}
              </p>
            </div>
          </div>
        ))}
      </div>
    </div>
  );
}

Caching & Performance Optimization

Redis Caching for Popular Queries

import redis
import json
from typing import Optional
 
redis_client = redis.Redis(
    host='localhost',
    port=6379,
    db=0,
    decode_responses=True
)
 
def search_with_cache(query: str, top_k: int = 50) -> List[Dict]:
    """
    Cache search results for 1 hour to reduce OpenAI API calls.
    """
    cache_key = f"search:{query}:{top_k}"
    
    # Check cache
    cached_results = redis_client.get(cache_key)
    if cached_results:
        print(f"✅ Cache HIT for query: {query}")
        return json.loads(cached_results)
    
    # Cache MISS - perform search
    print(f"❌ Cache MISS for query: {query}")
    results = semantic_search(query, top_k)
    
    # Store in cache (1 hour TTL)
    redis_client.setex(
        cache_key,
        3600,  # 1 hour
        json.dumps(results)
    )
    
    return results

Cache Performance:

Cache hit rate: 67% (popular queries)
Latency with cache: 15ms (12x faster than uncached)
Cost savings: $180/month (67% fewer OpenAI API calls)

Results & Impact

Metrics Comparison

Metric	Keyword Search (Before)	Semantic Search (After)	Improvement
Click-Through Rate (CTR)	4.2%	5.9%	+40% ↑
Avg. Search Results	23	47	+104% ↑
User Satisfaction	52%	85%	+63% ↑
Zero-Results Rate	18%	3%	-83% ↓
Avg. Query Latency	95ms	180ms	+89% (acceptable)
Revenue per Search	$2.30	$3.80	+65% ↑

Real Query Examples

Query: "laptop for coding students"

Keyword Search Results:

Gaming Laptop RGB ($1,200) ❌
2-in-1 Tablet Laptop ($800) ❌
Budget Chromebook ($250) ❌

Semantic Search Results:

Dell XPS 15 Developer Edition ($1,100) ✅
ThinkPad T14 (16GB RAM, Linux) ($950) ✅
MacBook Air M2 (Programming-optimized) ($1,000) ✅

Query: "sustainable winter jacket under $100"

Keyword Search:

3 results (only exact title matches)
Zero results with "eco-friendly" or "recycled"

Semantic Search:

47 results (understands synonyms)
Includes "eco-friendly", "recycled", "organic", "sustainable"
Correctly filters price ≤ $100

Key Learnings & Challenges

1. Embedding Quality is Critical

Challenge: Generic product titles → poor embeddings

❌ "Men's Jacket - Blue - Size M" (no semantic info)

Solution: Enrich text with attributes

✅ "Men's Jacket - Blue - Size M | Waterproof windbreaker with fleece lining, perfect for hiking and outdoor activities"

Result: 23% improvement in relevance scores

2. Cold Start Problem

Challenge: New products have no reviews/ratings for ranking

Solution: Multi-stage ranking

Stage 1: Semantic similarity (all products equal)
Stage 2: Boost popular products (num_reviews, rating)
Stage 3: Personalization (user preferences)

3. Cost Optimization

Challenge: 2.3M embeddings × $0.0001/1K tokens = expensive

Solution:

Batch processing: Generate embeddings in bulk (cheaper)
Incremental updates: Only re-embed changed products
Cache popular queries: Redis caching (67% hit rate)
Use smaller models: ada-002 (1536-dim) vs. text-davinci (4096-dim)

Monthly Costs:

OpenAI embeddings: ~$45/month (new products + queries)
Pinecone: ~$70/month (p1.x1 pod)
Redis: ~$15/month (ElastiCache)
Total: ~$130/month

4. Handling Multi-Intent Queries

Challenge: "red dress for wedding under $200" has multiple intents

Semantic: "wedding" → formal, elegant
Attribute: "red" → color filter
Price: "under $200" → price filter

Solution: Multi-stage pipeline

Extract structured filters (color, price, size)
Generate semantic embedding (remove filters from text)
Apply filters as metadata in Pinecone

Future Enhancements

Visual Search - Upload image, find similar products (CLIP embeddings)
Multi-Modal Search - Combine text + image queries
Personalization - User history + preferences in ranking
Cross-Lingual Search - Multilingual embeddings (mE5, SONAR)
Graph-Based Ranking - Product knowledge graph for related items
A/B Testing Framework - Systematic relevance improvements

ContextOS Platform Integration

This semantic search implementation is now part of SiramAI's ContextOS platform, which extends it with:

Multi-Agent Orchestration - Search agents coordinate with enrichment/ranking agents
Ontology-First Design - Product → Category → Brand relationships in knowledge graph
Dynamic Context Optimization - Hybrid retrieval (BM25 + Vector + Graph)
No-Code Workflows - Visual builder for search pipelines
Model-Agnostic - Swap OpenAI for Claude, Llama, or custom models

Learn more: SiramAI ContextOS

Conclusion

Building semantic search with vector embeddings transformed product discovery at SiramAI, improving CTR by 40% and user satisfaction by 63%. The combination of Pinecone's fast vector search, OpenAI's powerful embeddings, and hybrid ranking created a search experience that understands user intent, not just keywords.

Key Takeaways:

✅ Vector embeddings capture semantic meaning better than keyword matching
✅ Hybrid search (vector + keyword) outperforms either alone
✅ Rich product text (descriptions, attributes, tags) improves embedding quality
✅ Caching popular queries reduces costs and latency
✅ Metadata filtering enables complex queries ("under $100", "in stock")

Tech Stack Summary

Vector Database:

Pinecone (p1.x1 pod, 2.3M vectors)

Embeddings:

OpenAI text-embedding-ada-002 (1536-dim)

Backend:

Next.js API routes (TypeScript)
Python (indexing scripts)
Redis (query caching)
PostgreSQL (product metadata)

Frontend:

Next.js 14 (App Router)
React (search UI)
Tailwind CSS (styling)

Performance:

180ms average query latency
40% CTR improvement
67% cache hit rate
$130/month total cost