Project Background
I embarked on this project fueled by my newfound passion for DJing. As I delved into this realm, I recognized the multifaceted challenges DJs face — juggling BPM, valence, energy, and more. It seemed navigating these variables was a labyrinthine task. I also noticed the absence of dedicated platforms tailored specifically for DJs.
My vision was clear: to craft a comprehensive site, a hub exclusively for DJs. A platform not just for sharing remixes but also a thriving social network to amplify their reach. Moreover, I envisioned an intelligent recommender system and an array of tools and resources aimed at empowering DJs to enhance their skills.
The Technical Challenge
The core problem: How do you predict which songs will mix well together based on their musical features?
This required:
- Processing 170,000+ songs with 8 audio features each
- Training a regression model to predict "mixability scores"
- Deploying real-time inference at <500ms latency
- Integrating with Spotify API for live song data
- Building a scalable database architecture
Architecture Overview
┌─────────────┐
│ React UI │
└──────┬──────┘
│
▼
┌─────────────┐ ┌──────────────┐
│ Spotify │────▶│ Feature │
│ API │ │ Extraction │
└─────────────┘ └──────┬───────┘
│
▼
┌──────────────┐
│ AWS Lambda │
│ (Inference) │
└──────┬───────┘
│
▼
┌─────────────┐ ┌──────────────┐
│ MongoDB │◀────│ Sagemaker │
│ (170k) │ │ Model │
└─────────────┘ └──────────────┘
Part 1: Data Collection & Processing
Dataset Acquisition
I sourced a comprehensive dataset containing 170,000 songs, with each entry enriched by 8 features:
- BPM (Beats Per Minute)
- Energy
- Valence (musical positivity)
- Danceability
- Acousticness
- Instrumentalness
- Liveness
- Speechiness
Feature Engineering Script
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
# Initialize Spotify API
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(
client_id='YOUR_CLIENT_ID',
client_secret='YOUR_CLIENT_SECRET'
))
def extract_audio_features(track_id):
"""Extract audio features for a single track"""
features = sp.audio_features(track_id)[0]
return {
'id': track_id,
'bpm': features['tempo'],
'energy': features['energy'],
'valence': features['valence'],
'danceability': features['danceability'],
'acousticness': features['acousticness'],
'instrumentalness': features['instrumentalness'],
'liveness': features['liveness'],
'speechiness': features['speechiness']
}
def get_genre(track_id):
"""Retrieve genre for a track via artist lookup"""
track = sp.track(track_id)
artist_id = track['artists'][0]['id']
artist = sp.artist(artist_id)
return artist['genres'][0] if artist['genres'] else 'unknown'
# Process entire dataset
df = pd.read_csv('raw_tracks.csv')
df['features'] = df['track_id'].apply(extract_audio_features)
df['genre'] = df['track_id'].apply(get_genre)
# Normalize features for model training
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df[['bpm', 'energy', 'valence', 'danceability']] = scaler.fit_transform(
df[['bpm', 'energy', 'valence', 'danceability']]
)
df.to_csv('processed_170k_tracks.csv', index=False)
print(f"Processed {len(df)} tracks")Part 2: Model Training with AWS Sagemaker
August 8th, 11:03 PM - Initial Setup
Sagemaker kickstarted Mixmeister's training pipeline. I began with regression for the mixability score, considering future classification via Spotify's user preferences.
Training Script
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.estimator import SKLearn
# Initialize Sagemaker session
sagemaker_session = sagemaker.Session()
role = get_execution_role()
bucket = 'djdoc-ml-models'
# Upload training data to S3
train_data = sagemaker_session.upload_data(
path='processed_170k_tracks.csv',
bucket=bucket,
key_prefix='training'
)
# Define training script
sklearn_estimator = SKLearn(
entry_point='train_mixing_model.py',
role=role,
instance_type='ml.m5.xlarge',
framework_version='0.23-1',
hyperparameters={
'estimator': 'RandomForest',
'n_estimators': 200,
'max_depth': 20,
'min_samples_split': 5
}
)
# Start training job
sklearn_estimator.fit({'train': train_data})The Mixing Score Algorithm
The core of the model: predicting how well two songs mix based on feature similarity.
# train_mixing_model.py
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
import joblib
def calculate_mixing_score(song1_features, song2_features):
"""
Calculate mixing compatibility score between two songs
Score ranges from 0 (poor mix) to 100 (perfect mix)
"""
# BPM difference (most critical)
bpm_diff = abs(song1_features['bpm'] - song2_features['bpm'])
bpm_score = max(0, 100 - (bpm_diff * 2)) # Penalize BPM difference heavily
# Energy compatibility
energy_diff = abs(song1_features['energy'] - song2_features['energy'])
energy_score = max(0, 100 - (energy_diff * 100))
# Valence (mood) compatibility
valence_diff = abs(song1_features['valence'] - song2_features['valence'])
valence_score = max(0, 100 - (valence_diff * 80))
# Weighted combination
mixing_score = (
bpm_score * 0.4 +
energy_score * 0.3 +
valence_score * 0.3
)
return mixing_score
# Generate training pairs
def generate_training_pairs(df):
"""Create pairwise combinations for training"""
pairs = []
for i in range(len(df)):
for j in range(i+1, min(i+100, len(df))): # Sample 100 pairs per song
score = calculate_mixing_score(df.iloc[i], df.iloc[j])
pairs.append({
'features': np.concatenate([
df.iloc[i][['bpm', 'energy', 'valence', 'danceability']].values,
df.iloc[j][['bpm', 'energy', 'valence', 'danceability']].values
]),
'score': score
})
return pairs
# Train model
df = pd.read_csv('processed_170k_tracks.csv')
training_data = generate_training_pairs(df)
X = np.array([p['features'] for p in training_data])
y = np.array([p['score'] for p in training_data])
model = RandomForestRegressor(n_estimators=200, max_depth=20, random_state=42)
model.fit(X, y)
# Evaluate
predictions = model.predict(X)
print(f"R² Score: {r2_score(y, predictions):.3f}")
print(f"RMSE: {np.sqrt(mean_squared_error(y, predictions)):.3f}")
# Save model
joblib.dump(model, '/opt/ml/model/mixing_model.pkl')August 10th, 12:40 AM - Algorithm Optimization
Shifted from random forest classification to regression algorithm for enhanced performance. The regression approach provided continuous mixing scores rather than discrete categories, giving DJs more nuanced recommendations.
Part 3: Deployment Pipeline
August 14th, 1:48 AM - Model Endpoint Deployment
Mixmeister went live! The goal: generating mixing scores for searched songs and suggesting 3–5 compatible options.
Lambda Function for Real-Time Inference
# lambda_handler.py
import json
import boto3
import numpy as np
# Initialize Sagemaker runtime
runtime = boto3.client('sagemaker-runtime')
endpoint_name = 'djdoc-mixing-model'
def lambda_handler(event, context):
"""
Handle mixing score prediction requests
Input: Current song features + desired genre
Output: Top 5 compatible songs
"""
try:
# Parse request
body = json.loads(event['body'])
current_song_features = body['song_features']
target_genre = body.get('genre', 'all')
# Query MongoDB for candidate songs
candidates = get_candidate_songs(target_genre)
# Prepare features for batch prediction
feature_pairs = []
for candidate in candidates:
pair = np.concatenate([
current_song_features,
candidate['features']
])
feature_pairs.append(pair)
# Invoke Sagemaker endpoint
payload = json.dumps({'instances': feature_pairs})
response = runtime.invoke_endpoint(
EndpointName=endpoint_name,
ContentType='application/json',
Body=payload
)
# Parse predictions
predictions = json.loads(response['Body'].read())['predictions']
# Rank and return top 5
scored_songs = [
{'song': candidates[i], 'score': predictions[i]}
for i in range(len(candidates))
]
scored_songs.sort(key=lambda x: x['score'], reverse=True)
top_5 = scored_songs[:5]
return {
'statusCode': 200,
'headers': {
'Content-Type': 'application/json',
'Access-Control-Allow-Origin': '*'
},
'body': json.dumps(top_5)
}
except Exception as e:
print(f"Error: {str(e)}")
return {
'statusCode': 500,
'body': json.dumps({'error': str(e)})
}
def get_candidate_songs(genre):
"""Query MongoDB for songs in target genre"""
from pymongo import MongoClient
client = MongoClient(os.environ['MONGO_URI'])
db = client['djdoc']
query = {} if genre == 'all' else {'genre': genre}
songs = list(db.tracks.find(query).limit(100))
return songsAugust 23rd, 2:18 PM - CORS Fix & MongoDB Integration
Training to deployment via endpoints, Lambda functions, and API Gateways was smooth. But a CORS issue needed fixing.
// API Gateway CORS Configuration
module.exports.handler = async (event) => {
// Handle preflight OPTIONS request
if (event.requestContext.http.method === 'OPTIONS') {
return {
statusCode: 200,
headers: {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Methods': 'GET, POST, OPTIONS',
'Access-Control-Allow-Headers': 'Content-Type, Authorization',
},
body: '',
};
}
// Forward to Lambda
// ... rest of handler
};MongoDB Schema Design
// MongoDB Track Schema
const trackSchema = {
_id: ObjectId,
track_id: String, // Spotify ID
name: String,
artist: String,
album: String,
genre: String,
features: {
bpm: Number,
energy: Number,
valence: Number,
danceability: Number,
acousticness: Number,
instrumentalness: Number,
liveness: Number,
speechiness: Number
},
mixing_scores: [{ // Pre-computed scores for fast lookup
with_track_id: String,
score: Number
}],
created_at: Date
};
// Indexes for fast queries
db.tracks.createIndex({ genre: 1, "features.bpm": 1 });
db.tracks.createIndex({ "features.energy": 1, "features.valence": 1 });Part 4: Frontend Implementation
React Component for Song Search
// MixMeister.tsx
import React, { useState } from 'react';
import axios from 'axios';
interface Song {
id: string;
name: string;
artist: string;
bpm: number;
mixingScore: number;
}
export const MixMeister: React.FC = () => {
const [currentSong, setCurrentSong] = useState<Song | null>(null);
const [genre, setGenre] = useState<string>('house');
const [recommendations, setRecommendations] = useState<Song[]>([]);
const [loading, setLoading] = useState(false);
const searchSong = async (query: string) => {
// Search Spotify
const response = await axios.get('/api/search', {
params: { q: query }
});
setCurrentSong(response.data.tracks[0]);
};
const getMixingRecommendations = async () => {
if (!currentSong) return;
setLoading(true);
try {
const response = await axios.post(
'https://api.djdoc.com/mixing-score',
{
song_features: currentSong.features,
genre: genre
}
);
setRecommendations(response.data);
} catch (error) {
console.error('Error getting recommendations:', error);
} finally {
setLoading(false);
}
};
return (
<div className="mixmeister-container">
<h1>MixMeister</h1>
{/* Song Search */}
<input
type="text"
placeholder="Search for your current song..."
onChange={(e) => searchSong(e.target.value)}
/>
{/* Genre Selection */}
<select value={genre} onChange={(e) => setGenre(e.target.value)}>
<option value="house">House</option>
<option value="techno">Techno</option>
<option value="hip-hop">Hip Hop</option>
<option value="all">All Genres</option>
</select>
{/* Current Song Display */}
{currentSong && (
<div className="current-song">
<h3>{currentSong.name}</h3>
<p>{currentSong.artist}</p>
<p>BPM: {currentSong.bpm}</p>
<button onClick={getMixingRecommendations}>
Find Compatible Tracks
</button>
</div>
)}
{/* Recommendations */}
{loading ? (
<p>Finding the perfect mix...</p>
) : (
<div className="recommendations">
{recommendations.map((song) => (
<div key={song.id} className="recommendation-card">
<h4>{song.name}</h4>
<p>{song.artist}</p>
<div className="mixing-score">
Mixing Score: {song.mixingScore.toFixed(1)}/100
</div>
</div>
))}
</div>
)}
</div>
);
};Part 5: Firebase Authentication
By integrating Firebase Authentication, I regulated access to specific pages post user sign-up and login, heightening platform security.
// auth.ts
import { initializeApp } from 'firebase/app';
import { getAuth, signInWithEmailAndPassword, createUserWithEmailAndPassword } from 'firebase/auth';
const firebaseConfig = {
apiKey: process.env.NEXT_PUBLIC_FIREBASE_API_KEY,
authDomain: "djdoc-platform.firebaseapp.com",
projectId: "djdoc-platform",
};
const app = initializeApp(firebaseConfig);
export const auth = getAuth(app);
export const signUp = async (email: string, password: string) => {
try {
const userCredential = await createUserWithEmailAndPassword(auth, email, password);
return userCredential.user;
} catch (error) {
console.error('Sign up error:', error);
throw error;
}
};
export const signIn = async (email: string, password: string) => {
try {
const userCredential = await signInWithEmailAndPassword(auth, email, password);
return userCredential.user;
} catch (error) {
console.error('Sign in error:', error);
throw error;
}
};Part 6: Spotify Converter Feature
Built a download pipeline using spotDL library for converting Spotify playlists to local files.
# spotify_converter.py
from flask import Flask, request, jsonify, send_file
import spotipy
from spotdl import Spotdl
import zipfile
import os
app = Flask(__name__)
@app.route('/api/convert-playlist', methods=['POST'])
def convert_playlist():
"""
Convert Spotify playlist to downloadable zip
"""
playlist_id = request.json['playlist_id']
# Initialize Spotify client
sp = spotipy.Spotify(auth_manager=SpotifyOAuth())
# Get playlist tracks
results = sp.playlist_tracks(playlist_id)
tracks = results['items']
# Download tracks using spotDL
spotdl = Spotdl(
client_id=os.getenv('SPOTIFY_CLIENT_ID'),
client_secret=os.getenv('SPOTIFY_CLIENT_SECRET')
)
download_dir = f'/tmp/playlist_{playlist_id}'
os.makedirs(download_dir, exist_ok=True)
for track in tracks:
track_url = track['track']['external_urls']['spotify']
spotdl.download_songs([track_url], download_dir)
# Zip files
zip_path = f'/tmp/playlist_{playlist_id}.zip'
with zipfile.ZipFile(zip_path, 'w') as zipf:
for root, dirs, files in os.walk(download_dir):
for file in files:
zipf.write(os.path.join(root, file))
return send_file(zip_path, as_attachment=True)
if __name__ == '__main__':
app.run(debug=True)Part 7: Performance & Cost Optimization
September 8th, 10:30 PM - Project Completion
The Regen button operates seamlessly. However, model endpoint was shutdown due to loss of $400 on AWS.
Lessons Learned:
- Always use endpoint autoscaling limits
# Sagemaker endpoint config with limits
endpoint_config = {
'VariantName': 'AllTraffic',
'ModelName': 'djdoc-mixing-model',
'InitialInstanceCount': 1,
'InstanceType': 'ml.t2.medium', # Cheaper instance
'AutoScalingPolicy': {
'MinCapacity': 0, # Scale to zero when idle
'MaxCapacity': 2
}
}- Implement caching to reduce API calls
// Redis caching layer
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
async function getCachedRecommendations(songId: string, genre: string) {
const cacheKey = `recs:${songId}:${genre}`;
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
// If not cached, call Lambda
const recommendations = await getRecommendations(songId, genre);
// Cache for 1 hour
await redis.setex(cacheKey, 3600, JSON.stringify(recommendations));
return recommendations;
}- Pre-compute mixing scores for common pairs
Final Architecture & Metrics
System Performance
- Latency: <500ms for recommendations
- Database Size: 170,000 tracks indexed
- Concurrent Users: Handled 50+ simultaneous requests
- Cost: Reduced from $400/month to $50/month after optimizations
Tech Stack Summary
ML Pipeline:
- AWS Sagemaker for model training
- Random Forest Regressor (200 estimators)
- Python + scikit-learn
Backend:
- AWS Lambda for serverless inference
- AWS API Gateway for REST endpoints
- MongoDB for track database
- Redis for caching
- Node.js + Express
Frontend:
- React + TypeScript
- Firebase Authentication
- Spotify Web API
- Axios for API calls
DevOps:
- AWS CloudWatch for monitoring
- GitHub Actions for CI/CD
- Docker for containerization
Key Takeaways
- Feature engineering is critical - The mixing score algorithm needed careful weighting of BPM vs energy vs valence
- Cost monitoring is essential - Always set up billing alerts and endpoint limits
- Caching saves money - Redis reduced API calls by 80%
- Real-time ML inference is hard - <500ms latency required careful optimization
What's Next
Future enhancements planned:
- Harmonix social network - Enable DJs to share and discover remixes
- RemixMaster - AI-powered music remixing using Meta's AudioCraft
- Advanced ML model - Incorporate user listening history for personalized recommendations
- Mobile app - React Native version for on-the-go mixing