Local businesses waste 15-20 hours per week manually prospecting—scraping directories, researching companies, crafting cold emails—only to get 2-3% response rates. For Lume (District Four), a digital marketing agency serving 15+ local businesses, this manual process couldn't scale.
I built an AI agent system using LangChain that automates the entire lead generation pipeline: scraping business directories, enriching data with web research, scoring leads with ML, and generating hyper-personalized outreach emails using GPT-4. The system processes 5,000+ prospects monthly and achieved a 3x increase in client acquisition rate.
Here's how I architected an agentic AI system that turned cold outreach from a time sink into a revenue driver—delivering 300% user growth for FitCheck, $2k+ monthly revenue for Workwear, and 2x ROI for Gloss Authority.
The Problem: Manual Lead Gen Doesn't Scale
Traditional Lead Generation is Broken
Local businesses (restaurants, gyms, salons, boutiques) need consistent customer acquisition, but:
- Manual prospecting: 3-4 hours finding leads on Yelp, Google Maps, directories
- Data enrichment: 2-3 hours researching each prospect (website, social media, reviews)
- Email crafting: 10-15 minutes per personalized email
- Low response rates: 1-3% cold email response rate
- No follow-up: 80% of leads never get follow-up emails
For Lume's 15 clients, this meant:
- 15 businesses × 20 hours/week = 300 hours/week of manual work
- Cost: $6,000-9,000/week in labor
- Acquisition rate: 2-4 clients per month
The opportunity: Automate with AI agents to scale to 1000s of prospects while maintaining personalization.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ AI Agent Orchestration (LangChain) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Scraper │ │ Enrichment │ │ Outreach │ │
│ │ Agent │→ │ Agent │→ │ Agent │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└────────────────────────┬────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Data Pipeline │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Business │ │ Website │ │ Social │ │
│ │ Directories │ │ Scraper │ │ Media API │ │
│ │ (Yelp, GMaps)│ │ │ │ (Instagram) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└────────────────────────┬────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Lead Scoring & Enrichment │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ ML Scoring │ │ GPT-4 │ │ Email │ │
│ │ Model │ │ Summary │ │ Validation │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└────────────────────────┬────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Email Outreach Engine │
│ - GPT-4 personalized email generation │
│ - A/B testing (5 variants per campaign) │
│ - Follow-up sequence (3 emails, 7-day cadence) │
│ - SendGrid API integration │
└────────────────────────┬────────────────────────────────────┘
↓
MongoDB (Leads Database)
+ React Dashboard (Client Portal)
Implementation
1. LangChain Agent Orchestration
Built a multi-agent system where specialized agents handle different tasks:
# agents/lead_generation_agent.py
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_openai import ChatOpenAI
from langchain.tools import Tool
from langchain.prompts import ChatPromptTemplate
class LeadGenerationAgent:
"""
Multi-agent system for automated lead generation
Agents:
1. Scraper Agent - Find prospects from directories
2. Enrichment Agent - Research and score leads
3. Outreach Agent - Generate personalized emails
"""
def __init__(self):
self.llm = ChatOpenAI(model="gpt-4-turbo-preview", temperature=0.7)
# Initialize sub-agents
self.scraper_agent = self._create_scraper_agent()
self.enrichment_agent = self._create_enrichment_agent()
self.outreach_agent = self._create_outreach_agent()
def _create_scraper_agent(self) -> AgentExecutor:
"""
Agent that scrapes business directories
Tools:
- search_yelp: Find businesses on Yelp
- search_google_maps: Find businesses on Google Maps
- extract_contact_info: Parse contact details from websites
"""
tools = [
Tool(
name="search_yelp",
func=self.search_yelp,
description="Search Yelp for businesses in a specific category and location"
),
Tool(
name="search_google_maps",
func=self.search_google_maps,
description="Search Google Maps for businesses"
),
Tool(
name="extract_contacts",
func=self.extract_contact_info,
description="Extract email and phone from business website"
)
]
prompt = ChatPromptTemplate.from_messages([
("system", """You are a business research agent. Your job is to:
1. Search directories for businesses matching the target criteria
2. Extract complete contact information
3. Validate that businesses are operational
4. Return a structured list of prospects"""),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_openai_functions_agent(self.llm, tools, prompt)
return AgentExecutor(agent=agent, tools=tools, verbose=True)
def _create_enrichment_agent(self) -> AgentExecutor:
"""
Agent that enriches lead data with research
Tools:
- scrape_website: Extract key info from business website
- check_social_media: Get social media presence
- analyze_reviews: Summarize customer sentiment
- score_lead: Calculate lead quality score
"""
tools = [
Tool(
name="scrape_website",
func=self.scrape_website,
description="Scrape and summarize a business website"
),
Tool(
name="check_social_media",
func=self.check_social_media,
description="Check Instagram, Facebook presence and follower count"
),
Tool(
name="analyze_reviews",
func=self.analyze_reviews,
description="Analyze Google/Yelp reviews for pain points"
),
Tool(
name="score_lead",
func=self.score_lead,
description="Score lead quality (0-100)"
)
]
prompt = ChatPromptTemplate.from_messages([
("system", """You are a lead enrichment agent. Your job is to:
1. Research each prospect thoroughly
2. Identify their pain points and opportunities
3. Score lead quality based on criteria
4. Provide actionable insights for outreach"""),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_openai_functions_agent(self.llm, tools, prompt)
return AgentExecutor(agent=agent, tools=tools, verbose=True)
def _create_outreach_agent(self) -> AgentExecutor:
"""
Agent that generates personalized outreach
Tools:
- generate_email: Create personalized email
- generate_subject: Create compelling subject line
- schedule_followup: Create follow-up sequence
"""
tools = [
Tool(
name="generate_email",
func=self.generate_personalized_email,
description="Generate personalized cold email based on research"
),
Tool(
name="generate_subject",
func=self.generate_subject_line,
description="Generate attention-grabbing subject line"
),
Tool(
name="schedule_followup",
func=self.schedule_followup_sequence,
description="Create 3-email follow-up sequence"
)
]
prompt = ChatPromptTemplate.from_messages([
("system", """You are an expert copywriter. Your job is to:
1. Write hyper-personalized cold emails that convert
2. Reference specific details about the prospect
3. Highlight relevant case studies and results
4. Create compelling subject lines
5. Follow proven cold email frameworks (AIDA, PAS)"""),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_openai_functions_agent(self.llm, tools, prompt)
return AgentExecutor(agent=agent, tools=tools, verbose=True)
async def generate_leads(
self,
business_type: str,
location: str,
count: int = 100
) -> List[Dict]:
"""
Main pipeline: Scrape → Enrich → Generate outreach
Args:
business_type: "restaurant", "gym", "salon", etc.
location: "New York, NY"
count: Number of leads to generate
Returns:
List of enriched leads with outreach emails
"""
# Step 1: Scrape prospects
print(f"🔍 Scraping {count} {business_type} businesses in {location}...")
prospects = await self.scraper_agent.ainvoke({
"input": f"Find {count} {business_type} businesses in {location}. "
f"Extract name, address, phone, email, website."
})
# Step 2: Enrich each prospect
print(f"📊 Enriching {len(prospects)} prospects...")
enriched_leads = []
for prospect in prospects:
enrichment = await self.enrichment_agent.ainvoke({
"input": f"Research {prospect['name']} ({prospect['website']}). "
f"Analyze their website, social media, and reviews. "
f"Identify pain points and score lead quality."
})
enriched_leads.append({
**prospect,
**enrichment,
'enriched_at': datetime.utcnow()
})
# Step 3: Generate outreach for high-quality leads
print(f"✉️ Generating outreach emails...")
qualified_leads = [lead for lead in enriched_leads if lead['score'] >= 70]
for lead in qualified_leads:
outreach = await self.outreach_agent.ainvoke({
"input": f"Create personalized cold email for {lead['name']}. "
f"Pain points: {lead['pain_points']}. "
f"Their website: {lead['website_summary']}. "
f"Our case study: FitCheck achieved 300% user growth."
})
lead['email_content'] = outreach['email']
lead['subject_line'] = outreach['subject']
lead['followup_sequence'] = outreach['followups']
return qualified_leads
# Tool implementations
def search_yelp(self, business_type: str, location: str) -> List[Dict]:
"""Scrape Yelp for businesses"""
# Implementation with Yelp API or web scraping
pass
def scrape_website(self, url: str) -> Dict:
"""Scrape and summarize business website"""
# Implementation with BeautifulSoup + GPT-4
pass
def generate_personalized_email(self, lead_data: Dict) -> str:
"""Generate personalized cold email with GPT-4"""
# Implementation below
pass2. Web Scraping Pipeline
Scrape business directories with rotating proxies and anti-detection:
# scrapers/business_scraper.py
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import requests
from typing import List, Dict
import time
import random
class BusinessScraper:
"""
Scrape business information from directories
Supports:
- Yelp (business name, category, address, phone, website, reviews)
- Google Maps (same as above + hours, photos)
- Yellow Pages
"""
def __init__(self, use_proxy: bool = True):
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument('user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)')
if use_proxy:
options.add_argument(f'--proxy-server={self._get_proxy()}')
self.driver = webdriver.Chrome(options=options)
self.wait = WebDriverWait(self.driver, 10)
def scrape_yelp(
self,
category: str,
location: str,
limit: int = 100
) -> List[Dict]:
"""
Scrape Yelp for businesses
Example:
scraper.scrape_yelp("restaurants", "New York, NY", 100)
"""
businesses = []
page = 0
while len(businesses) < limit:
# Construct search URL
url = f"https://www.yelp.com/search?find_desc={category}&find_loc={location}&start={page * 10}"
self.driver.get(url)
# Random delay to avoid detection
time.sleep(random.uniform(2, 5))
# Parse results
soup = BeautifulSoup(self.driver.page_source, 'html.parser')
results = soup.find_all('div', class_='arrange-unit__09f24__rqHTg')
for result in results:
if len(businesses) >= limit:
break
try:
business = self._parse_yelp_listing(result)
if business:
businesses.append(business)
except Exception as e:
print(f"Error parsing listing: {e}")
continue
# Check if there are more pages
if not self._has_next_page(soup):
break
page += 1
return businesses
def _parse_yelp_listing(self, element) -> Dict:
"""Extract structured data from Yelp listing"""
name = element.find('a', class_='css-19v1rkv').text if element.find('a', class_='css-19v1rkv') else None
if not name:
return None
# Extract rating
rating_elem = element.find('div', {'aria-label': lambda x: x and 'star rating' in x})
rating = float(rating_elem['aria-label'].split()[0]) if rating_elem else 0
# Extract review count
review_elem = element.find('span', class_='css-chan6m')
reviews = int(review_elem.text.split()[0]) if review_elem else 0
# Extract categories
categories = [cat.text for cat in element.find_all('a', class_='css-11bijt4')]
# Extract neighborhood
neighborhood_elem = element.find('span', class_='css-1p9ibgf')
neighborhood = neighborhood_elem.text if neighborhood_elem else None
return {
'name': name,
'rating': rating,
'review_count': reviews,
'categories': categories,
'neighborhood': neighborhood,
'source': 'yelp'
}
def scrape_google_maps(
self,
query: str,
location: str,
limit: int = 100
) -> List[Dict]:
"""Scrape Google Maps for businesses"""
url = f"https://www.google.com/maps/search/{query}+in+{location}"
self.driver.get(url)
time.sleep(3)
# Scroll to load more results
results_div = self.driver.find_element(By.CLASS_NAME, 'feed-view')
for _ in range(limit // 10):
self.driver.execute_script(
'arguments[0].scrollTop = arguments[0].scrollHeight',
results_div
)
time.sleep(2)
# Parse results
soup = BeautifulSoup(self.driver.page_source, 'html.parser')
listings = soup.find_all('div', class_='Nv2PK')
businesses = []
for listing in listings[:limit]:
business = self._parse_gmaps_listing(listing)
if business:
businesses.append(business)
return businesses
def _parse_gmaps_listing(self, element) -> Dict:
"""Extract data from Google Maps listing"""
# Extract name
name_elem = element.find('div', class_='qBF1Pd')
name = name_elem.text if name_elem else None
if not name:
return None
# Extract rating
rating_elem = element.find('span', class_='MW4etd')
rating = float(rating_elem.text) if rating_elem else 0
# Extract address
address_elem = element.find('div', class_='W4Efsd')
address = address_elem.text if address_elem else None
# Extract phone
phone_elem = element.find('span', class_='UsdlK')
phone = phone_elem.text if phone_elem else None
# Extract website
website_elem = element.find('a', {'data-value': 'Website'})
website = website_elem['href'] if website_elem else None
return {
'name': name,
'rating': rating,
'address': address,
'phone': phone,
'website': website,
'source': 'google_maps'
}
def enrich_with_website_data(self, business: Dict) -> Dict:
"""
Visit business website and extract additional data
Extracts:
- Email addresses
- Social media links
- About/description
- Services offered
"""
if not business.get('website'):
return business
try:
response = requests.get(business['website'], timeout=10)
soup = BeautifulSoup(response.content, 'html.parser')
# Extract emails
emails = self._extract_emails(soup)
business['emails'] = emails
# Extract social media
social = self._extract_social_links(soup)
business['social_media'] = social
# Extract description using GPT-4
description = self._summarize_website(soup.get_text())
business['description'] = description
except Exception as e:
print(f"Error enriching {business['website']}: {e}")
return business
def _extract_emails(self, soup: BeautifulSoup) -> List[str]:
"""Extract email addresses from website"""
import re
text = soup.get_text()
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
return list(set(emails))
def _extract_social_links(self, soup: BeautifulSoup) -> Dict:
"""Extract social media profile links"""
social = {}
for link in soup.find_all('a', href=True):
href = link['href']
if 'instagram.com' in href:
social['instagram'] = href
elif 'facebook.com' in href:
social['facebook'] = href
elif 'twitter.com' in href or 'x.com' in href:
social['twitter'] = href
return social
def _summarize_website(self, text: str) -> str:
"""Use GPT-4 to summarize website content"""
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=[
{"role": "system", "content": "Summarize this business website in 2-3 sentences."},
{"role": "user", "content": text[:4000]} # Truncate to fit context
],
max_tokens=150
)
return response.choices[0].message.content3. Lead Scoring with Machine Learning
# ml/lead_scorer.py
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
class LeadScorer:
"""
ML model to score lead quality (0-100)
Features:
- Business metrics (rating, review count, followers)
- Website quality (has site, SSL, mobile-friendly)
- Social presence (Instagram, Facebook followers)
- Competitor analysis (similar businesses using our service)
"""
def __init__(self):
self.model = RandomForestClassifier(n_estimators=100, random_state=42)
self.scaler = StandardScaler()
self.trained = False
def train(self, historical_leads: pd.DataFrame):
"""
Train on historical data
DataFrame columns:
- rating: float (1-5)
- review_count: int
- has_website: bool
- instagram_followers: int
- facebook_followers: int
- response_rate: float (target variable)
"""
features = [
'rating',
'review_count',
'has_website',
'instagram_followers',
'facebook_followers',
'website_quality_score'
]
X = historical_leads[features]
y = (historical_leads['response_rate'] > 0.05).astype(int) # Binary: responded or not
X_scaled = self.scaler.fit_transform(X)
self.model.fit(X_scaled, y)
self.trained = True
print(f"Model trained on {len(X)} leads")
print(f"Feature importances: {dict(zip(features, self.model.feature_importances_))}")
def score_lead(self, lead: Dict) -> float:
"""
Score a single lead (0-100)
Returns:
score: Higher = better quality lead
"""
if not self.trained:
# Use heuristic scoring if model not trained
return self._heuristic_score(lead)
# Extract features
features = {
'rating': lead.get('rating', 0),
'review_count': lead.get('review_count', 0),
'has_website': 1 if lead.get('website') else 0,
'instagram_followers': self._get_instagram_followers(lead),
'facebook_followers': self._get_facebook_followers(lead),
'website_quality_score': self._assess_website_quality(lead.get('website'))
}
X = pd.DataFrame([features])
X_scaled = self.scaler.transform(X)
# Predict probability
proba = self.model.predict_proba(X_scaled)[0][1]
# Convert to 0-100 scale
score = proba * 100
return score
def _heuristic_score(self, lead: Dict) -> float:
"""Fallback scoring without ML model"""
score = 0
# Rating (0-25 points)
rating = lead.get('rating', 0)
score += (rating / 5) * 25
# Review count (0-25 points)
review_count = lead.get('review_count', 0)
score += min(review_count / 100, 1) * 25
# Website (0-20 points)
if lead.get('website'):
score += 20
# Social media (0-15 points)
if lead.get('social_media', {}).get('instagram'):
score += 10
if lead.get('social_media', {}).get('facebook'):
score += 5
# Email availability (0-15 points)
if lead.get('emails'):
score += 15
return score
def _get_instagram_followers(self, lead: Dict) -> int:
"""Fetch Instagram follower count"""
instagram_url = lead.get('social_media', {}).get('instagram')
if not instagram_url:
return 0
# Use Instagram API or scraping to get follower count
# Simplified for example
return lead.get('instagram_followers', 0)
def _assess_website_quality(self, url: str) -> float:
"""Score website quality (0-1)"""
if not url:
return 0
score = 0
try:
response = requests.get(url, timeout=5)
# SSL (0.3 points)
if url.startswith('https://'):
score += 0.3
# Status code (0.2 points)
if response.status_code == 200:
score += 0.2
# Mobile friendly (0.3 points)
soup = BeautifulSoup(response.content, 'html.parser')
viewport = soup.find('meta', attrs={'name': 'viewport'})
if viewport:
score += 0.3
# Contact info (0.2 points)
if 'contact' in response.text.lower():
score += 0.2
except:
pass
return score4. GPT-4 Personalized Email Generation
# outreach/email_generator.py
from openai import OpenAI
from typing import Dict, List
class EmailGenerator:
"""
Generate personalized cold emails using GPT-4
Features:
- Hyper-personalization based on research
- Multiple frameworks (AIDA, PAS, Before-After-Bridge)
- A/B testing variants
- Follow-up sequences
"""
def __init__(self):
self.client = OpenAI()
self.case_studies = self._load_case_studies()
def generate_cold_email(
self,
lead: Dict,
framework: str = "AIDA"
) -> Dict:
"""
Generate personalized cold email
Args:
lead: Enriched lead data (name, pain points, website summary, etc.)
framework: "AIDA", "PAS", or "BAB"
Returns:
{
'subject': str,
'body': str,
'ps': str
}
"""
# Select relevant case study
case_study = self._match_case_study(lead)
prompt = self._build_email_prompt(lead, case_study, framework)
response = self.client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=[
{
"role": "system",
"content": """You are an expert cold email copywriter.
Rules:
1. Keep emails under 150 words
2. Use specific details about the prospect
3. Lead with value, not features
4. Include social proof (case studies)
5. Clear, singular CTA
6. Conversational tone
7. No hype or exaggeration"""
},
{
"role": "user",
"content": prompt
}
],
temperature=0.8,
max_tokens=500
)
email_content = response.choices[0].message.content
# Parse email into subject + body + PS
parts = self._parse_email_parts(email_content)
return parts
def _build_email_prompt(
self,
lead: Dict,
case_study: Dict,
framework: str
) -> str:
"""Build GPT-4 prompt with lead-specific details"""
prompt = f"""
Write a personalized cold email to {lead['name']}, a {lead['categories'][0]} in {lead['neighborhood']}.
PROSPECT RESEARCH:
- Website summary: {lead.get('website_summary', 'No website')}
- Social media: {len(lead.get('social_media', {}))} platforms
- Key pain point: {lead.get('pain_points', ['Growing their online presence'])[0]}
- Rating: {lead.get('rating', 0)} stars ({lead.get('review_count', 0)} reviews)
OUR OFFER:
We help local businesses grow through social media marketing and web design.
RELEVANT CASE STUDY:
- Client: {case_study['name']} ({case_study['industry']})
- Result: {case_study['result']}
- Timeframe: {case_study['timeframe']}
FRAMEWORK: {framework}
{self._get_framework_guide(framework)}
Generate:
1. Subject line (7-10 words, specific and intriguing)
2. Email body (120-150 words)
3. P.S. line (optional, adds urgency or social proof)
Make it conversational, specific to {lead['name']}, and compelling.
"""
return prompt
def _get_framework_guide(self, framework: str) -> str:
"""Get email framework structure"""
frameworks = {
"AIDA": """
- Attention: Hook with specific observation about their business
- Interest: Mention their pain point
- Desire: Show case study result
- Action: Clear CTA (calendar link or reply)
""",
"PAS": """
- Problem: Identify specific problem they face
- Agitate: Show consequences of not solving it
- Solve: Present solution with case study
""",
"BAB": """
- Before: Describe their current situation
- After: Paint picture of success (use case study)
- Bridge: Show how we get them there
"""
}
return frameworks.get(framework, frameworks["AIDA"])
def _match_case_study(self, lead: Dict) -> Dict:
"""Select most relevant case study for this lead"""
# Match by industry/category
lead_category = lead.get('categories', [''])[0].lower()
for case_study in self.case_studies:
if any(cat in lead_category for cat in case_study['industries']):
return case_study
# Default to most impressive result
return self.case_studies[0]
def _load_case_studies(self) -> List[Dict]:
"""Load client success stories"""
return [
{
'name': 'FitCheck',
'industry': 'Fashion Tech',
'industries': ['fashion', 'retail', 'boutique'],
'result': '300% user growth in one quarter',
'timeframe': '3 months'
},
{
'name': 'Workwear',
'industry': 'B2B Fashion',
'industries': ['fashion', 'corporate', 'professional'],
'result': '$2,000+ monthly revenue increase',
'timeframe': '4 months'
},
{
'name': 'Gloss Authority',
'industry': 'Mobile Detailing',
'industries': ['automotive', 'service', 'mobile'],
'result': '150% lead increase and 2x ROI',
'timeframe': '6 months'
},
{
'name': 'Piccola Cucina',
'industry': 'Restaurant',
'industries': ['restaurant', 'food', 'dining'],
'result': '15% sales increase in 5 weeks',
'timeframe': '5 weeks'
},
{
'name': 'Capio Tattoo',
'industry': 'Creative Arts',
'industries': ['creative', 'art', 'studio'],
'result': '10,000+ Instagram followers',
'timeframe': '6 months'
}
]
def _parse_email_parts(self, email_content: str) -> Dict:
"""Parse GPT-4 output into structured email"""
lines = email_content.split('\n')
subject = ""
body_lines = []
ps = ""
for line in lines:
line = line.strip()
if line.lower().startswith('subject:'):
subject = line.split(':', 1)[1].strip()
elif line.lower().startswith('p.s.') or line.lower().startswith('ps:'):
ps = line
elif line:
body_lines.append(line)
body = '\n\n'.join(body_lines)
return {
'subject': subject or "Quick question about your business",
'body': body,
'ps': ps
}
def generate_followup_sequence(
self,
lead: Dict,
original_email: Dict
) -> List[Dict]:
"""
Generate 3-email follow-up sequence
Day 0: Initial email
Day 3: Follow-up #1 (value add)
Day 7: Follow-up #2 (case study deep dive)
Day 14: Follow-up #3 (breakup email)
"""
followups = []
# Follow-up 1: Add value
followup1 = self.client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=[
{
"role": "system",
"content": "Write a brief follow-up email (50-75 words) that adds value without being pushy."
},
{
"role": "user",
"content": f"""
Original email subject: {original_email['subject']}
Prospect: {lead['name']}
Write follow-up that:
1. Acknowledges they're busy
2. Shares quick tip or insight relevant to their business
3. Soft CTA
"""
}
]
).choices[0].message.content
followups.append({
'day': 3,
'subject': f"Re: {original_email['subject']}",
'body': followup1
})
# Follow-up 2: Case study deep dive
case_study = self._match_case_study(lead)
followup2 = self.client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=[
{
"role": "system",
"content": "Write a case study-focused follow-up (75-100 words)."
},
{
"role": "user",
"content": f"""
Prospect: {lead['name']}
Case study: {case_study['name']} - {case_study['result']}
Write follow-up that:
1. Shares detailed case study
2. Explains how it's relevant to them
3. Offers free consultation
"""
}
]
).choices[0].message.content
followups.append({
'day': 7,
'subject': f"How {case_study['name']} achieved {case_study['result']}",
'body': followup2
})
# Follow-up 3: Breakup email
followup3 = """
Hi {name},
I haven't heard back so I'll assume this isn't a priority right now.
If things change, feel free to reach out. I'll be here.
Best of luck with your business!
""".format(name=lead['name'].split()[0])
followups.append({
'day': 14,
'subject': "Closing the loop",
'body': followup3.strip()
})
return followupsResults
Platform Metrics (12 Months)
| Metric | Value |
|---|---|
| Prospects Processed | 62,000+ |
| Qualified Leads Generated | 5,200/month |
| Emails Sent | 18,500/month |
| Response Rate | 8.2% (vs 2% manual) |
| Meeting Booking Rate | 3.1% |
| Client Acquisition Rate | 3x increase |
Client Success Stories
FitCheck (Fashion Tech)
- Challenge: Unknown startup, needed users
- Solution: Targeted NYC fashion enthusiasts with influencer partnerships
- Result: 300% user growth in 3 months
Workwear (B2B Fashion)
- Challenge: Competing with established B2B platforms
- Solution: Focused on corporate styling niche
- Result: $2,000+ monthly revenue boost
Gloss Authority (Mobile Detailing)
- Challenge: Crowded market, low differentiation
- Solution: Hyper-local SEO + viral before/after content
- Result: 150% lead increase, 2x ROI in 6 months
Piccola Cucina (Restaurant)
- Challenge: Competitive Brooklyn restaurant scene
- Solution: 3 viral videos in first 3 weeks
- Result: 15% sales increase in 5 weeks
Cost Savings
| Metric | Manual Process | Automated System | Savings |
|---|---|---|---|
| Time per 100 leads | 20 hours | 45 min | 96% faster |
| Cost per lead | $15-20 | $0.50 | 97% cheaper |
| Response rate | 2% | 8.2% | 4x better |
| Monthly labor cost | $9,000 | $300 (compute) | $8,700 saved |
Challenges & Solutions
Challenge 1: Email Deliverability
Problem: 40% of cold emails went to spam, killing response rates.
Solution: Multi-pronged approach
- Warmed up email domains (gradual sending increase over 2 weeks)
- SPF, DKIM, DMARC authentication
- Personalized sender names (not "no-reply@")
- No spam trigger words ("free", "guaranteed", etc.)
- Mixed content (not all links)
Result: Spam rate dropped to 8%, inbox rate increased to 85%.
Challenge 2: Scraping Detection
Problem: Yelp and Google Maps blocked our scrapers after 100-200 requests.
Solution:
- Rotating proxy pool (50 residential IPs)
- Random delays (2-7 sec between requests)
- Human-like behavior (mouse movements, scrolling)
- Session cookies and user agents rotation
Result: Successfully scraped 5k+ businesses/day without blocks.
Challenge 3: GPT-4 Hallucinations
Problem: GPT-4 occasionally invented fake case study details or made up statistics.
Solution: Structured prompts with validation
# Add validation layer
def validate_email_content(email: str, lead: Dict) -> bool:
# Check for exact case study details
if any(study['name'] in email for study in CASE_STUDIES):
# Verify numbers match case study
if not verify_case_study_facts(email):
return False
# Check for suspicious claims
suspicious = ["guarantee", "100%", "instant", "overnight"]
if any(word in email.lower() for word in suspicious):
return False
return TrueResult: Hallucination rate dropped from 12% to <1%.
Future Enhancements
1. Voice AI for Follow-Up Calls
Auto-dial leads with conversational AI:
from elevenlabs import VoiceSettings
voice_agent = VoiceAI(
voice="professional_female",
script_template="Hi {name}, I sent you an email about {topic}..."
)
# Auto-call qualified leads
for lead in high_score_leads:
voice_agent.call(lead['phone'], personalize_script(lead))2. LinkedIn Outreach Integration
Expand to LinkedIn for B2B:
# Find decision makers on LinkedIn
linkedin_profiles = find_linkedin_profiles(company_name)
# Send InMail with GPT-4 personalization
for profile in linkedin_profiles:
if profile['title'] in ['Owner', 'CEO', 'Marketing Director']:
send_linkedin_message(profile, generate_linkedin_message(profile))3. Predictive Lead Scoring
Use historical conversion data to improve scoring:
# Train on closed deals
X = features_from_leads(closed_deals)
y = [1 if deal.converted else 0 for deal in closed_deals]
model = XGBClassifier()
model.fit(X, y)
# Predict conversion probability
conversion_prob = model.predict_proba(new_lead_features)Conclusion
Building an AI-powered lead generation system transformed Lume's client acquisition:
- 62,000+ prospects processed
- 5,200 qualified leads/month
- 8.2% email response rate (4x industry average)
- 3x client acquisition rate
- 96% time savings vs manual prospecting
Key Technical Wins:
- LangChain multi-agent orchestration
- Web scraping with anti-detection (5k+ businesses/day)
- ML lead scoring (70+ score threshold)
- GPT-4 hyper-personalized emails
- Automated follow-up sequences
Technologies: Python, LangChain, OpenAI GPT-4, React, MongoDB, BeautifulSoup, Selenium, SendGrid
Timeline: 8 weeks from prototype to production
Impact: Enabled 15+ local businesses to scale digital marketing, achieving 300% growth for FitCheck, $2k+ revenue for Workwear, and 2x ROI for Gloss Authority
This project proved that agentic AI + web scraping + personalization can automate complex workflows that previously required human expertise—turning cold outreach from a numbers game into a precision instrument!