Overview
Traditional keyword-based search fails when users search for products using natural language or concepts that don't exactly match product titles. At SiramAI, I built a semantic search engine using Pinecone vector database and OpenAI embeddings that understands the meaning behind queries, not just keyword matches.
Key Achievements:
- š 40% increase in click-through rate (CTR) for product discovery
- š Natural language queries like "sustainable winter jacket under $100"
- ā” <200ms query latency for semantic search
- š 2.3M+ product embeddings indexed in Pinecone
- šÆ 85% user satisfaction with search relevance (vs. 52% with keyword search)
The Problem with Keyword Search
Traditional Search Limitations
Keyword-based search (like Elasticsearch BM25) relies on exact text matching:
# Traditional keyword search
query = "sustainable winter jacket under $100"
# Searches for products containing: ["sustainable", "winter", "jacket", "$100"]
# Misses: "eco-friendly down coat" (different words, same meaning)Problems:
- Vocabulary Mismatch - "eco-friendly" ā "sustainable" (semantically same, different words)
- No Context Understanding - Can't interpret "winter" ā needs insulation, warmth
- Price/Attribute Filtering - "under $100" requires separate filter logic
- Synonym Blindness - "coat" vs "jacket" seen as completely different
- Multi-lingual Gap - Can't handle cross-language queries
Real User Query Examples
| User Query | Keyword Search Result | Semantic Search Result |
|---|---|---|
| "sustainable winter jacket under $100" | ā 3 results (exact title match) | ā 47 results (understands eco-friendly, coat, etc.) |
| "laptop for coding students" | ā Shows gaming laptops | ā Shows programming-optimized laptops |
| "formal shoes that won't hurt my feet" | ā Only finds "formal shoes" | ā Finds comfortable dress shoes |
Solution: Semantic Search with Vector Embeddings
How Vector Embeddings Work
Vector embeddings convert text into high-dimensional numerical representations where semantically similar items are close together:
# Text ā Vector (1536 dimensions with OpenAI ada-002)
embedding("sustainable jacket") # [0.021, -0.15, 0.82, ...]
embedding("eco-friendly coat") # [0.019, -0.14, 0.81, ...] ā Close!
embedding("bicycle helmet") # [0.91, 0.42, -0.33, ...] ā Far!
# Cosine similarity
similarity("sustainable jacket", "eco-friendly coat") # 0.94 (very similar)
similarity("sustainable jacket", "bicycle helmet") # 0.12 (not similar)Key Insight: The model learns that "sustainable" and "eco-friendly" are synonyms, "jacket" and "coat" are similar, even though the words are different.
System Architecture
āāāāāāāāāāāāāāāāāāā
ā User Query ā "sustainable winter jacket under $100"
āāāāāāāāāā¬āāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Query Processing Pipeline ā
ā 1. Text preprocessing ā
ā 2. OpenAI embedding (1536-dim vector) ā
ā 3. Attribute extraction (price, category) ā
āāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Pinecone Vector Database ā
ā ⢠2.3M product embeddings ā
ā ⢠Cosine similarity search ā
ā ⢠Metadata filtering (price, brand, etc.) ā
āāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Hybrid Ranking ā
ā ⢠Vector similarity (70%) ā
ā ⢠BM25 keyword score (20%) ā
ā ⢠Popularity boost (10%) ā
āāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāā
ā Top 50 Results ā ā Re-ranked by relevance
āāāāāāāāāāāāāāāāāāā
Implementation Details
1. Generating Product Embeddings
First, we create embeddings for all products in the catalog:
import openai
import pinecone
from typing import List, Dict
import os
# Initialize OpenAI
openai.api_key = os.getenv("OPENAI_API_KEY")
# Initialize Pinecone
pinecone.init(
api_key=os.getenv("PINECONE_API_KEY"),
environment="us-west1-gcp"
)
# Create index (1536 dimensions for ada-002)
index_name = "product-search"
if index_name not in pinecone.list_indexes():
pinecone.create_index(
name=index_name,
dimension=1536,
metric="cosine", # Cosine similarity for semantic search
pod_type="p1.x1" # Performance tier
)
index = pinecone.Index(index_name)
def create_product_text(product: Dict) -> str:
"""
Combine product attributes into rich text for embedding.
More context = better semantic understanding.
"""
text_parts = [
product.get("title", ""),
product.get("description", ""),
f"Category: {product.get('category', '')}",
f"Brand: {product.get('brand', '')}",
f"Material: {product.get('material', '')}",
f"Color: {product.get('color', '')}",
# Include tags for semantic richness
" ".join(product.get("tags", []))
]
return " ".join(filter(None, text_parts))
def get_embedding(text: str) -> List[float]:
"""
Generate embedding using OpenAI ada-002 model.
Cost: $0.0001 per 1K tokens (~750 words)
"""
response = openai.Embedding.create(
model="text-embedding-ada-002",
input=text
)
return response["data"][0]["embedding"]
def index_products(products: List[Dict], batch_size: int = 100):
"""
Index products into Pinecone in batches for efficiency.
"""
vectors = []
for product in products:
# Create rich text representation
product_text = create_product_text(product)
# Generate embedding
embedding = get_embedding(product_text)
# Prepare vector with metadata
vectors.append({
"id": product["id"],
"values": embedding,
"metadata": {
"title": product["title"],
"price": product["price"],
"category": product["category"],
"brand": product["brand"],
"image_url": product["image_url"],
"stock": product["stock"],
"rating": product.get("rating", 0),
"num_reviews": product.get("num_reviews", 0)
}
})
# Batch upsert to Pinecone
if len(vectors) >= batch_size:
index.upsert(vectors=vectors)
vectors = []
print(f"Indexed {len(vectors)} products...")
# Upsert remaining
if vectors:
index.upsert(vectors=vectors)
print(f"ā
Total products indexed: {index.describe_index_stats()['total_vector_count']}")
# Example: Index 2.3M products
# products = load_products_from_database() # Your product catalog
# index_products(products)Indexing Performance:
- Batch size: 100 products/batch
- Throughput: ~500 products/second
- Total time for 2.3M products: ~75 minutes
- Cost: ~$23 (2.3M embeddings Ć $0.0001/1K tokens)
2. Query-Time Search
When a user searches, we embed their query and find similar products:
from typing import List, Dict, Optional
import re
def extract_price_range(query: str) -> Optional[Dict]:
"""
Extract price constraints from natural language.
Examples: "under $100", "between $50 and $200", "less than 500"
"""
# Pattern: under/below/less than $X
match = re.search(r'(?:under|below|less than)\s*\$?(\d+)', query, re.IGNORECASE)
if match:
return {"$lte": float(match.group(1))}
# Pattern: over/above/more than $X
match = re.search(r'(?:over|above|more than)\s*\$?(\d+)', query, re.IGNORECASE)
if match:
return {"$gte": float(match.group(1))}
# Pattern: between $X and $Y
match = re.search(r'between\s*\$?(\d+)\s*and\s*\$?(\d+)', query, re.IGNORECASE)
if match:
return {"$gte": float(match.group(1)), "$lte": float(match.group(2))}
return None
def semantic_search(
query: str,
top_k: int = 50,
filters: Optional[Dict] = None
) -> List[Dict]:
"""
Perform semantic search with optional metadata filtering.
"""
# Extract structured filters from query
price_filter = extract_price_range(query)
# Build metadata filter
metadata_filter = filters or {}
if price_filter:
metadata_filter["price"] = price_filter
# Generate query embedding
query_embedding = get_embedding(query)
# Search Pinecone
results = index.query(
vector=query_embedding,
top_k=top_k,
include_metadata=True,
filter=metadata_filter if metadata_filter else None
)
# Format results
products = []
for match in results["matches"]:
products.append({
"id": match["id"],
"score": match["score"], # Cosine similarity (0-1)
"title": match["metadata"]["title"],
"price": match["metadata"]["price"],
"category": match["metadata"]["category"],
"brand": match["metadata"]["brand"],
"image_url": match["metadata"]["image_url"],
"rating": match["metadata"].get("rating", 0),
})
return products
# Example query
results = semantic_search(
query="sustainable winter jacket under $100",
top_k=50
)
for product in results[:5]:
print(f"{product['title']} - ${product['price']} (score: {product['score']:.3f})")Example Output:
Patagonia Recycled Down Parka - $95 (score: 0.912)
North Face Eco ThermoBall Jacket - $89 (score: 0.897)
Columbia Omni-Heat Winter Coat - $79 (score: 0.885)
REI Co-op Sustainable Puffer - $99 (score: 0.871)
Marmot EcoDry Shell Jacket - $92 (score: 0.865)
Search Performance:
- Query latency: 180ms average (p99: 250ms)
- Pinecone query: 120ms
- OpenAI embedding: 60ms
- Relevance score: 0.85+ for top results
3. Hybrid Search (Vector + Keyword)
Pure semantic search can miss exact matches. We combine vector search with traditional keyword search:
from elasticsearch import Elasticsearch
import numpy as np
es = Elasticsearch(["http://localhost:9200"])
def hybrid_search(
query: str,
top_k: int = 50,
vector_weight: float = 0.7,
keyword_weight: float = 0.3
) -> List[Dict]:
"""
Combine semantic search (Pinecone) with keyword search (Elasticsearch).
Weights: 70% semantic, 30% keyword for balanced relevance.
"""
# 1. Semantic search (Pinecone)
semantic_results = semantic_search(query, top_k=top_k)
semantic_scores = {r["id"]: r["score"] for r in semantic_results}
# 2. Keyword search (Elasticsearch BM25)
es_response = es.search(
index="products",
body={
"query": {
"multi_match": {
"query": query,
"fields": ["title^3", "description", "category", "brand"],
"type": "best_fields"
}
},
"size": top_k
}
)
keyword_scores = {}
for hit in es_response["hits"]["hits"]:
# Normalize BM25 score to 0-1 range
normalized_score = hit["_score"] / es_response["hits"]["max_score"]
keyword_scores[hit["_id"]] = normalized_score
# 3. Combine scores with weighted average
all_product_ids = set(semantic_scores.keys()) | set(keyword_scores.keys())
hybrid_results = []
for product_id in all_product_ids:
semantic_score = semantic_scores.get(product_id, 0)
keyword_score = keyword_scores.get(product_id, 0)
# Weighted average
final_score = (
vector_weight * semantic_score +
keyword_weight * keyword_score
)
# Add popularity boost (10%)
# product = get_product_by_id(product_id)
# popularity_score = product["num_reviews"] / 1000 # Normalize
# final_score += 0.1 * popularity_score
hybrid_results.append({
"id": product_id,
"score": final_score,
"semantic_score": semantic_score,
"keyword_score": keyword_score
})
# Sort by final score
hybrid_results.sort(key=lambda x: x["score"], reverse=True)
return hybrid_results[:top_k]
# Example
results = hybrid_search("sustainable winter jacket under $100")Why Hybrid Works:
- Semantic search: Catches synonyms, concepts ("eco-friendly" = "sustainable")
- Keyword search: Ensures exact brand names, SKUs are prioritized
- Best of both worlds: 40% CTR improvement vs. keyword-only (28% vs. semantic-only)
4. Next.js Frontend Integration
// app/api/search/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';
const pinecone = new Pinecone({
apiKey: process.env.PINECONE_API_KEY!,
});
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY!,
});
const index = pinecone.index('product-search');
export async function POST(request: NextRequest) {
try {
const { query, filters, topK = 50 } = await request.json();
// Generate embedding
const embeddingResponse = await openai.embeddings.create({
model: 'text-embedding-ada-002',
input: query,
});
const queryEmbedding = embeddingResponse.data[0].embedding;
// Extract price filter from query
const priceMatch = query.match(/under\s*\$?(\d+)/i);
const priceFilter = priceMatch
? { price: { $lte: parseFloat(priceMatch[1]) } }
: {};
// Search Pinecone
const searchResults = await index.query({
vector: queryEmbedding,
topK,
includeMetadata: true,
filter: { ...filters, ...priceFilter },
});
// Format results
const products = searchResults.matches.map(match => ({
id: match.id,
score: match.score,
...match.metadata,
}));
return NextResponse.json({
success: true,
query,
count: products.length,
products,
latency: `${Date.now() - startTime}ms`,
});
} catch (error) {
console.error('Search error:', error);
return NextResponse.json(
{ success: false, error: 'Search failed' },
{ status: 500 }
);
}
}React Search Component
// components/SemanticSearch.tsx
'use client';
import { useState } from 'react';
import { Search, Loader2 } from 'lucide-react';
interface Product {
id: string;
title: string;
price: number;
image_url: string;
score: number;
category: string;
brand: string;
}
export default function SemanticSearch() {
const [query, setQuery] = useState('');
const [products, setProducts] = useState<Product[]>([]);
const [loading, setLoading] = useState(false);
const [latency, setLatency] = useState<string>('');
const handleSearch = async (e: React.FormEvent) => {
e.preventDefault();
if (!query.trim()) return;
setLoading(true);
try {
const response = await fetch('/api/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query, topK: 50 }),
});
const data = await response.json();
if (data.success) {
setProducts(data.products);
setLatency(data.latency);
}
} catch (error) {
console.error('Search failed:', error);
} finally {
setLoading(false);
}
};
return (
<div className="max-w-6xl mx-auto p-6">
{/* Search Bar */}
<form onSubmit={handleSearch} className="mb-8">
<div className="relative">
<Search className="absolute left-4 top-1/2 -translate-y-1/2 text-gray-400 w-5 h-5" />
<input
type="text"
value={query}
onChange={(e) => setQuery(e.target.value)}
placeholder="Try: sustainable winter jacket under $100"
className="w-full pl-12 pr-4 py-4 text-lg border rounded-lg focus:ring-2 focus:ring-blue-500"
/>
</div>
{latency && (
<p className="mt-2 text-sm text-gray-600">
Found {products.length} results in {latency}
</p>
)}
</form>
{/* Loading State */}
{loading && (
<div className="flex items-center justify-center py-12">
<Loader2 className="w-8 h-8 animate-spin text-blue-500" />
</div>
)}
{/* Results Grid */}
<div className="grid grid-cols-1 md:grid-cols-3 lg:grid-cols-4 gap-6">
{products.map((product) => (
<div
key={product.id}
className="border rounded-lg overflow-hidden hover:shadow-lg transition-shadow"
>
<img
src={product.image_url}
alt={product.title}
className="w-full h-48 object-cover"
/>
<div className="p-4">
<h3 className="font-semibold text-sm line-clamp-2 mb-2">
{product.title}
</h3>
<div className="flex items-center justify-between">
<span className="text-lg font-bold text-blue-600">
${product.price}
</span>
<span className="text-xs text-gray-500">
{(product.score * 100).toFixed(0)}% match
</span>
</div>
<p className="text-xs text-gray-600 mt-1">
{product.brand} ⢠{product.category}
</p>
</div>
</div>
))}
</div>
</div>
);
}Caching & Performance Optimization
Redis Caching for Popular Queries
import redis
import json
from typing import Optional
redis_client = redis.Redis(
host='localhost',
port=6379,
db=0,
decode_responses=True
)
def search_with_cache(query: str, top_k: int = 50) -> List[Dict]:
"""
Cache search results for 1 hour to reduce OpenAI API calls.
"""
cache_key = f"search:{query}:{top_k}"
# Check cache
cached_results = redis_client.get(cache_key)
if cached_results:
print(f"ā
Cache HIT for query: {query}")
return json.loads(cached_results)
# Cache MISS - perform search
print(f"ā Cache MISS for query: {query}")
results = semantic_search(query, top_k)
# Store in cache (1 hour TTL)
redis_client.setex(
cache_key,
3600, # 1 hour
json.dumps(results)
)
return resultsCache Performance:
- Cache hit rate: 67% (popular queries)
- Latency with cache: 15ms (12x faster than uncached)
- Cost savings: $180/month (67% fewer OpenAI API calls)
Results & Impact
Metrics Comparison
| Metric | Keyword Search (Before) | Semantic Search (After) | Improvement |
|---|---|---|---|
| Click-Through Rate (CTR) | 4.2% | 5.9% | +40% ā |
| Avg. Search Results | 23 | 47 | +104% ā |
| User Satisfaction | 52% | 85% | +63% ā |
| Zero-Results Rate | 18% | 3% | -83% ā |
| Avg. Query Latency | 95ms | 180ms | +89% (acceptable) |
| Revenue per Search | $2.30 | $3.80 | +65% ā |
Real Query Examples
Query: "laptop for coding students"
Keyword Search Results:
- Gaming Laptop RGB ($1,200) ā
- 2-in-1 Tablet Laptop ($800) ā
- Budget Chromebook ($250) ā
Semantic Search Results:
- Dell XPS 15 Developer Edition ($1,100) ā
- ThinkPad T14 (16GB RAM, Linux) ($950) ā
- MacBook Air M2 (Programming-optimized) ($1,000) ā
Query: "sustainable winter jacket under $100"
Keyword Search:
- 3 results (only exact title matches)
- Zero results with "eco-friendly" or "recycled"
Semantic Search:
- 47 results (understands synonyms)
- Includes "eco-friendly", "recycled", "organic", "sustainable"
- Correctly filters price ⤠$100
Key Learnings & Challenges
1. Embedding Quality is Critical
Challenge: Generic product titles ā poor embeddings
- ā "Men's Jacket - Blue - Size M" (no semantic info)
Solution: Enrich text with attributes
- ā "Men's Jacket - Blue - Size M | Waterproof windbreaker with fleece lining, perfect for hiking and outdoor activities"
Result: 23% improvement in relevance scores
2. Cold Start Problem
Challenge: New products have no reviews/ratings for ranking
Solution: Multi-stage ranking
- Stage 1: Semantic similarity (all products equal)
- Stage 2: Boost popular products (num_reviews, rating)
- Stage 3: Personalization (user preferences)
3. Cost Optimization
Challenge: 2.3M embeddings Ć $0.0001/1K tokens = expensive
Solution:
- Batch processing: Generate embeddings in bulk (cheaper)
- Incremental updates: Only re-embed changed products
- Cache popular queries: Redis caching (67% hit rate)
- Use smaller models: ada-002 (1536-dim) vs. text-davinci (4096-dim)
Monthly Costs:
- OpenAI embeddings: ~$45/month (new products + queries)
- Pinecone: ~$70/month (p1.x1 pod)
- Redis: ~$15/month (ElastiCache)
- Total: ~$130/month
4. Handling Multi-Intent Queries
Challenge: "red dress for wedding under $200" has multiple intents
- Semantic: "wedding" ā formal, elegant
- Attribute: "red" ā color filter
- Price: "under $200" ā price filter
Solution: Multi-stage pipeline
- Extract structured filters (color, price, size)
- Generate semantic embedding (remove filters from text)
- Apply filters as metadata in Pinecone
Future Enhancements
- Visual Search - Upload image, find similar products (CLIP embeddings)
- Multi-Modal Search - Combine text + image queries
- Personalization - User history + preferences in ranking
- Cross-Lingual Search - Multilingual embeddings (mE5, SONAR)
- Graph-Based Ranking - Product knowledge graph for related items
- A/B Testing Framework - Systematic relevance improvements
ContextOS Platform Integration
This semantic search implementation is now part of SiramAI's ContextOS platform, which extends it with:
- Multi-Agent Orchestration - Search agents coordinate with enrichment/ranking agents
- Ontology-First Design - Product ā Category ā Brand relationships in knowledge graph
- Dynamic Context Optimization - Hybrid retrieval (BM25 + Vector + Graph)
- No-Code Workflows - Visual builder for search pipelines
- Model-Agnostic - Swap OpenAI for Claude, Llama, or custom models
Learn more: SiramAI ContextOS
Conclusion
Building semantic search with vector embeddings transformed product discovery at SiramAI, improving CTR by 40% and user satisfaction by 63%. The combination of Pinecone's fast vector search, OpenAI's powerful embeddings, and hybrid ranking created a search experience that understands user intent, not just keywords.
Key Takeaways:
- ā Vector embeddings capture semantic meaning better than keyword matching
- ā Hybrid search (vector + keyword) outperforms either alone
- ā Rich product text (descriptions, attributes, tags) improves embedding quality
- ā Caching popular queries reduces costs and latency
- ā Metadata filtering enables complex queries ("under $100", "in stock")
Tech Stack Summary
Vector Database:
- Pinecone (p1.x1 pod, 2.3M vectors)
Embeddings:
- OpenAI text-embedding-ada-002 (1536-dim)
Backend:
- Next.js API routes (TypeScript)
- Python (indexing scripts)
- Redis (query caching)
- PostgreSQL (product metadata)
Frontend:
- Next.js 14 (App Router)
- React (search UI)
- Tailwind CSS (styling)
Performance:
- 180ms average query latency
- 40% CTR improvement
- 67% cache hit rate
- $130/month total cost