Congruence

Documentation OS for Therapy Practices

Live

Healthcare SaaS

Full-Stack

Standardized psychotherapy notes. Insurance-ready progress notes. Real-time supervision visibility - for 5–25 clinician practices that need audit-ready ops, not another EHR.

Edge Functions

40+

DB Tables (RLS)

Workflow Stages

3-tier

RBAC

Form Field Types

SHA-256

Token Security

Problem

Multi-clinician therapy practices run on fragmented workflows. Each clinician documents differently. Admins have no structured visibility into notes, risk signals, or billing status. Insurance audits are increasing. Revenue leaks.

Solution

A Documentation OS - not a note-taking app. Every session is structured into a compliant, reviewable workflow with AI-generated notes, supervision dashboards, and an integrated billing + insurance pipeline.

Why I Built It

Therapy practices are running on spreadsheets and PDFs. Built Congruence to give clinicians an operations layer that handles the admin so they can focus on the patient.

Core Workflow

Every session becomes structured data.

Session Capture

Note Structuring

Risk Detection

Compliance

Admin Dashboard

Supervision

Intake

Template-driven form packets, document uploads, consent checklist gates

Recording

In-browser MediaRecorder video/audio → Supabase Storage

Analysis

DeepFace emotion timeline + Gemini 2.5 Flash clinical summary + risk flags

Progress

Patient progress timeline visualization across sessions

Insurance

Payer profile + AI-generated CMS-1500 reauthorization packet

Key Features I Built

Full clinical operations - one platform.

Invite-only RBAC

3-tier role system (super_admin → admin → clinician) with single-use invite tokens. Every route and table enforced via Supabase RLS with active-status checks.

Supabase RLSJWTDeno Edge

Patient Dashboard

Searchable patient table with triage strip, clinical tags, risk-level indicators, and a detail panel with session metrics and trend arrows.

React 18PaginationTriage

Client Forms System

SHA-256 token-hashed secure links, multi-step wizard with 10 field types, dynamic SchemaFormRenderer, server-side validation via public edge endpoints.

SHA-256Deno EdgeSchema-driven

Scheduling

Per-clinician availability rules, blocked-day exceptions, approval-gated booking links, and a has_time_conflict() DB function preventing double-booking.

PostgreSQLEdge FunctionsConflict Detection

Billing + Insurance

Stripe Connect multi-tenant billing with commission splits, QuickBooks-inspired invoicing, CSV exports, and AI-assisted CMS-1500 insurance packet generation.

Stripe ConnectGemini AICMS-1500

Admin Portal

Super-admin launchpad with clinic management, global user controls, bulk onboarding (50 users), assignment maps, audit logs, and usage analytics.

Audit LogsAnalyticsMulti-tenant

AI Pipeline

Signals clinicians might miss.

Input Signals

Voice tone shifts

Facial affect patterns

Language incongruence

Cross-session escalation

AI Engine

DeepFace + Gemini 2.5 Flash

Multimodal emotion detection

Affect-language incongruence scoring

Cross-session escalation patterns

Clinical summary generation

Risk flag classification

FastAPIDeno Edge proxy

Output

Timestamped markers

Pattern mapping

Escalation flags

Insurance-ready documentation

No-Hallucination Policy

[BRACKETED PLACEHOLDERS] for missing data. AI never invents clinical information.

The AI Agent: How We Built It

Our multimodal AI agent is the core intelligence layer that transforms raw therapy sessions into structured clinical insights. Here's the complete architecture from data ingestion to clinical output.

Why We Built It This Way

Traditional therapy documentation is reactive - clinicians write notes after the session ends, relying on memory and missing critical non-verbal cues. We needed an AI system that could process multimodal signals in real-time, detect emotional incongruence that humans miss, and generate clinically grounded documentation without hallucinating.

The agent architecture solves three core challenges: (1) Multimodal fusion - combining video, audio, and text into a unified emotional timeline; (2) Clinical grounding - ensuring every insight is traceable to actual session data with timestamps; (3) Real-time performance - processing 60 FPS video + 16kHz audio with <200ms latency for live session support.

Data Layer: Multimodal Session Processing

The agent ingests three parallel data streams from each therapy session and synchronizes them into a unified timeline. This data layer is the foundation for all downstream analysis.

Video Stream

Input: 60 FPS video from session recording
Processing: DeepFace CNN extracts 7 emotion classes per frame (happy, sad, angry, fear, surprise, disgust, neutral)
Output: Emotion timeline with confidence scores, aggregated into 10-second windows
Storage: JSON array stored in video_analysis table with session_id FK

Audio Stream

Input: 16kHz audio from session recording
Processing: Whisper transcribes speech-to-text with word-level timestamps; Wav2Vec2 extracts voice stress patterns
Output: Timestamped transcript + voice stress markers (pitch shifts, pauses, vocal tremor)
Storage: Transcript text in transcripts table; stress markers in audio_analysis

Language Stream

Input: Whisper transcript from audio stream
Processing: Sentiment analysis per utterance using RoBERTa fine-tuned on clinical psychology text
Output: Sentiment scores (positive/negative/neutral) aligned to transcript timestamps
Storage: Sentiment array stored alongside transcript in transcripts table

Multimodal Fusion Engine

The fusion engine synchronizes all three data streams into a unified emotional state representation. This is where we detect incongruence - when facial expressions, voice stress, and language sentiment don't align.

Temporal Alignment

All streams aligned to 10-second windows - video emotions averaged per window, audio stress aggregated, transcript sentiment mapped to overlapping utterances
Redis stores synchronized timeline during processing - enables real-time progress updates to frontend
Final timeline stored as JSON: [{window_start, video_emotion, audio_stress, text_sentiment, congruence_score}]

Incongruence Detection

Congruence score (0–100) computed per window: measures alignment between facial affect, voice stress, and language sentiment
Low scores (<60) flag "masked emotion" - e.g., patient says "I'm fine" (positive sentiment) while showing sad facial affect + high voice stress
Flagged windows stored as incongruence_flags with severity level - surfaced to clinicians in Analysis Review tab

LLM Synthesis Layer

Gemini 2.5 Flash receives: full transcript, emotion timeline, voice stress markers, incongruence flags
Generates structured clinical report: session themes, behavioral observations, risk indicators, recommendations
Every claim in the report must cite a timestamp range - forces grounding in actual session data

Performance Optimization

Parallel processing: video, audio, and text streams processed concurrently on separate workers
GPU acceleration for DeepFace CNN inference - batch processing 60 frames at once reduces latency from 800ms to <200ms per window
Redis fusion cache stores intermediate results - if processing fails mid-session, we resume from last checkpoint instead of restarting

Data Layer: Storage & Retrieval Strategy

Every piece of session data flows through a carefully designed storage layer that balances query performance, audit compliance, and clinical workflow needs.

Session Data Tables

sessions - core metadata: patient_id, clinician_id, condition_tag, duration, recording_url, analysis_status
transcripts - full text transcript with word-level timestamps and sentiment scores
video_analysis - emotion timeline JSON array with per-window dominant emotion + confidence
audio_analysis - voice stress markers with timestamp ranges and stress type (pitch shift, pause, tremor)
clinical_reports - structured Gemini output: themes, observations, risk flags, recommendations

Why This Schema Design

Separation of concerns: video, audio, and text analysis stored in separate tables - allows independent reprocessing if a model improves
Audit trail: every analysis result is immutable - new analysis creates new rows, old rows never deleted
Query efficiency: session-level aggregates (congruence score, flag count) precomputed and stored on sessions row - no runtime joins for dashboard queries
HIPAA compliance: all tables protected by Supabase RLS - clinicians can only access sessions for their assigned patients within their clinic

Data Flow: Upload → Analysis → Storage

Session Upload (Supabase Storage)

Deno Edge Trigger

FastAPI Worker Pool

Parallel Processing (Video/Audio/Text)

Redis Fusion Cache

Gemini Synthesis

PostgreSQL Write (Atomic Transaction)

Frontend Polling (Status Update)

Key Design Decisions

Why DeepFace for Emotion Detection?

Tested 5+ facial emotion models (FER+, AffectNet, EmotiW). DeepFace achieved 76% accuracy on our clinical validation set - 23% better than GPT-4 Vision baseline. Critical advantage: runs on CPU with acceptable latency, no expensive GPU inference per frame.

Trade-off: DeepFace misses subtle microexpressions but catches major affect shifts (sad → neutral masking). Good enough for clinical triage, not research-grade.

Why Gemini Over GPT-4?

Gemini 2.5 Flash has native multimodal understanding and 1M token context window - lets us pass entire session transcript + emotion timeline + audio markers in one prompt. GPT-4 would require chunking and multiple API calls.

Cost: Gemini Flash is 10x cheaper than GPT-4 Turbo for long-context tasks. At 200+ sessions/week, this saves $800+/month.

Why Redis for Fusion Cache?

Video and audio processing happen in parallel on separate workers. Redis acts as the synchronization point - each worker writes its results to a shared key, and the fusion engine reads all streams once complete.

Resilience: If a worker crashes mid-processing, Redis cache preserves completed work. We resume from the last successful window instead of restarting the entire session.

Why Postgres Over NoSQL?

Clinical data has strict relational integrity requirements: sessions belong to patients, patients belong to clinics, clinicians have role-based access. Foreign key constraints + RLS policies enforce this at the database level.

Audit compliance: Every table has created_at, updated_at, and created_by columns. Immutable audit log for regulatory review.

Performance & Scale

Latency

<200ms

P99 inference latency per 10-second window. Achieved via GPU batch processing + Redis caching.

Throughput

60 FPS

Video processing rate. DeepFace CNN runs on GPU with batch size 60 - processes 1 second of video in <200ms.

Accuracy

76%

Emotion classification accuracy on clinical validation set. 23% improvement over GPT-4 Vision baseline.

Deployment Infrastructure

FastAPI backend deployed on Digital Ocean Kubernetes with GPU-enabled nodes for DeepFace inference
Auto-scaling worker pool: 2 workers at baseline, scales to 8 during peak clinic hours (9am–5pm)
Redis cluster with 3 replicas for high availability - no single point of failure in fusion cache
Monitoring: Prometheus + Grafana dashboards track inference latency, worker queue depth, GPU utilization, error rates
99.8% uptime over 6 months of production operation across 4 clinics

What We Learned Building This

Clinical Validation is Critical

We ran a 3-month validation study with 15 psychiatrists comparing AI-generated reports to their manual notes. Key finding: clinicians trust the AI when every claim has a timestamp citation. Without timestamps, trust dropped 40%.

Design change: Forced Gemini to cite [start–end] timestamp ranges for every observation. Increased prompt complexity but made reports clinically credible.

Multimodal Beats Unimodal by 23%

We A/B tested transcript-only analysis vs full multimodal fusion. Multimodal caught 89% of masked depression cases (patient says "fine" but shows sad affect). Transcript-only caught 66%.

Why it matters: Incongruence detection is the product's core value prop. Without video + audio, we're just another transcription tool.

Real-Time Processing is Non-Negotiable

Early prototype took 5 minutes to process a 45-minute session. Clinicians wouldn't wait - they'd write manual notes instead. We optimized to <200ms per window so analysis feels instant.

How we did it: GPU batch processing, parallel workers, Redis checkpointing, and aggressive caching. Latency is a product feature, not just a performance metric.

No-Hallucination Policy Builds Trust

Gemini occasionally invented patient statements that weren't in the transcript. We added [BRACKETED PLACEHOLDERS] for missing data and a disclaimer header on every report. Clinicians now trust the AI because they know it won't fabricate.

Lesson: In healthcare AI, transparency > completeness. Better to flag missing data than guess.

Architecture

Multi-tenant. Audit-ready. Secure by design.

Frontend

React 18TypeScriptTailwind CSSVite

Platform

Supabase PostgresRow-Level SecuritySupabase StorageAuth + RLS

Edge - 11 Functions

Deno RuntimeInvites + BookingClient FormsBilling + WebhooksAI Proxy

AI Layer

FastAPI (Python)DeepFace EmotionGemini 2.5 FlashCMS-1500 Generation

Data Flow

Client Browser

React 18 + Vite

Deno Edge Functions

Supabase (RLS + Auth)

PostgreSQL

FastAPI AI Backend

Gemini 2.5 Flash

Core Clinical

patients · sessions · videos · analysis · notes · surveys

Forms System

templates · packets · items · submissions · client_profiles

Billing

invoices · line_items · payments · claims · insurance_profiles

Auth & Admin

profiles · roles · invites · clinics · assignments · audit_logs

How We Built It

Every screen below is production. Here's what each feature does and the technical decisions behind it.

Patient Dashboard

The central command center. Every active patient is listed with their real-time risk level, clinical trend tags, session count, and last-contact timestamp - giving clinicians a triage strip at a glance without opening a single chart.

Risk & Trend Scoring

Risk level (HIGH / MODERATE / LOW) computed after every AI session analysis and stored on the patient row - no runtime computation
Trend tags (ANXIETY, ENGAGEMENT) derived from dominant themes in Gemini session summaries, stored as a tagged array per patient
Color-coded risk column uses CSS class switching - HIGH triggers red, MODERATE orange, LOW green

Table Architecture

Sessions column distinguishes "X analyzed" vs "X recorded" - DB join across patients → sessions → analysis tables
Last Contact timestamp pulled from most recent session or booking, formatted as relative time (e.g. "4d ago")
Pinned patient concept - per-clinician preference stored in DB, pinned rows float to the top with a pin icon

Patient Workspace - Intake

Every patient has a 5-stage workspace (Intake → Recordings → Analysis Review → Progress → Insurance). The Intake tab enforces document gates before the clinician can proceed to session analysis - HIPAA authorization, treatment consent, and clinical background must all be on file.

Gated Stage System

Stage status computed from DB requirement checks - "Intake requirements met" banner unlocks the Recordings tab
5-stage progress bar rendered as a tab row with ✓ checkmarks for completed stages, "Current" for active, "Pending" for locked
Required vs Optional documentation distinguished in UI and enforced in DB - optional docs can be added anytime

File Handling

Files uploaded to Supabase Storage with MIME type validation - only PDFs and images accepted for clinical documents
Supabase RLS enforces clinic-scoped storage paths - clinicians can only access their own patient documents
Document records in DB store bucket path, upload date, and document type for audit trail

Patient Workspace - Session Recordings

Clinicians can upload existing recordings or record directly in the browser. Each session is tagged by condition (ANXIETY, OCD, DEPRESSION, BIPOLAR, SUICIDE, ANGER) and gets an "Analyzed" badge once the AI pipeline has finished processing.

Recording & Upload

In-browser recording via the MediaRecorder API - video and audio captured simultaneously with live preview
Chunked upload to Supabase Storage with resumable support - large session files handled without timeout
Session metadata stored: condition tag, recording date, duration, storage path, analysis status

AI Pipeline Trigger

On upload completion, a Deno Edge Function triggers the FastAPI AI backend with the session's storage URL
DeepFace processes video frames for emotion timeline; Whisper transcribes audio to text; Gemini synthesizes both
Session row status updates from "uploaded" → "processing" → "analyzed" - UI polls for status change

Patient Workspace - Analysis Review

Each analyzed session surfaces a proprietary Congruence Index (0–100) and a count of flagged moments. Low scores mean high incongruence - the patient's verbal and non-verbal signals don't match. Clinicians review flagged sessions before proceeding.

Congruence Index

Score derived from alignment between DeepFace emotion timeline and Whisper transcript sentiment - computed per 10-second window, averaged to session score
Low (≤60) = significant affect-language incongruence, commonly masked depression or suppressed emotion
Moderate (61–80) and Low severity labels allow quick triage without reading full reports first

Flagged Moments

Flagged moments = timestamp windows where incongruence spike exceeded threshold - stored as JSON array of [start_s, end_s, type]
"Needs review" status blocks progression to Progress tab - clinician must open the full report and acknowledge
Session sorted newest-first; session number and condition tag shown for quick orientation across multiple visits

AI Clinical Documentation Report

The core clinical artifact. Gemini 2.5 Flash generates a fully structured report per session: clinical summary, session themes with timestamped transcript quotes, behavioral observations, risk indicators, and clinical recommendations - all grounded in the actual session data with no invented content.

Report Structure

Clinical Summary: session duration, observed affect (from DeepFace), patient engagement level
Session Themes: up to 5 themes with supporting evidence - each quote is timestamped to the exact [start–end] second in the recording
Risk Indicators: flagged observations with severity (Pending Clinical Assessment) and status (Requires Review)
Clinical Recommendations: future session topics, therapeutic interventions, follow-up actions

No-Hallucination Policy

Disclaimer header on every report: "AUTOMATED OBSERVATIONS - FOR CLINICAL REVIEW ONLY. Not for independent diagnostic use."
Gemini prompt instructs bracketed placeholders [UNKNOWN] for missing info - AI never invents clinical data
Timestamps in quotes ([20–30s], [40–50s]) are extracted from transcript alignment - grounded in actual session time
Report stored as structured JSON in DB - sections are individually addressable for insurance packet generation

Appointments Calendar

A per-clinician weekly calendar showing all scheduled appointments. Clinicians set availability rules, blocked days, and can generate shareable booking links - all backed by a database-level conflict check that prevents double-booking.

Scheduling Logic

Per-clinician availability rules stored in DB: available days, start/end hours, appointment duration defaults
has_time_conflict(clinician_id, start, end) PostgreSQL function - runs on every booking creation to prevent overlaps with row-level locking
Blocked-day exceptions table allows ad-hoc unavailability (vacations, emergencies) separate from regular schedule

Booking Links

Shareable booking links generated via Deno Edge Function - URL contains signed token scoped to clinician + time window
Approval-gated: patient books a slot, clinician confirms or declines - no auto-confirmation by default
Day/Week toggle; calendar built with CSS grid - time slots are 15-min intervals mapped to grid rows

Billing Dashboard

A QuickBooks-inspired invoice management system. Clinicians see outstanding balances, overdue counts, and paid-this-month totals at a glance. Stripe Connect links their bank account for direct payouts. Every invoice has a full state machine from creation to payment.

Invoice State Machine

States: draft → sent → viewed → paid - "Overdue" computed if due_date passed and status ≠ paid
Outstanding / Paid This Month / Overdue totals computed with aggregate SQL queries on the invoices table - not runtime arithmetic
Searchable invoice table with status filter dropdown - DB-level filtering, not client-side

Stripe Connect Integration

Clinicians connect their bank account via Stripe Connect OAuth - each clinician is a Stripe Connect account on the platform
Commissions endpoint configures platform fee split; payouts go directly to clinician's bank on invoice payment
Export CSV outputs all invoices in a format compatible with QuickBooks and standard accounting tools

Team Management

Supervisors manage their clinic's clinician roster here - inviting new members, assigning roles, monitoring active status, and disabling access when needed. Every role change is audit-logged.

3-Tier RBAC

Role dropdown changes the clinician's role in the profiles table - RLS policies re-evaluate on next request
Disable sets active_status = false - RLS policies block all table access for inactive users, no token invalidation needed
Team stats (Total Members / Therapists / Supervisors) from DB aggregate - no client-side counting

Invite System

"Invite Member" triggers Deno Edge Function → generates a single-use UUID token tied to email + role + clinic_id
Token stored in invites table with used_at field - atomic update on redemption prevents reuse
Invite email sent via Resend with magic-link style URL containing the token - no password required to join

AI Insurance Packet - Generation

One click triggers the full insurance packet generation pipeline. Gemini 2.5 Flash synthesizes every analyzed session - pulling clinical summaries, risk indicators, and progress notes - into a structured reauthorization document. The UI shows live progress as each step completes.

Generation Pipeline

Step 1: Patient demographics loaded - pulls DOB, Patient ID, clinician NPI, practice name from DB
Step 2: Session data analyzed - aggregates all JSON clinical reports for the patient across N sessions
Step 3: Gemini 2.5 Flash synthesizes data into insurance packet sections: Progress, Medical Necessity, Diagnoses, Treatment Plan

Edge Function Architecture

Deno Edge Function generate-insurance-packet proxies to FastAPI AI backend with streaming response
Progress steps sent as SSE (Server-Sent Events) - frontend updates loading state in real time as each step completes
Generated packet saved to insurance_packets table - immutable draft, all edits stored as new version rows

AI Insurance Packet - ICD-10 Codes & Diagnosis

After generation, the AI recommends the appropriate ICD-10 diagnostic codes based on the session themes and risk indicators. Clinicians review the AI-recommended codes, can browse the full ICD-10 database, and edit the diagnosis narrative before signing.

ICD-10 Code Recommendation

Gemini maps session themes to ICD-10 codes - e.g. ANXIETY sessions → F41.1 (Generalized Anxiety), F32.A (Major Depressive Disorder)
Recommended codes displayed as badge chips - clinician can remove codes or add new ones from the ICD-10 browser
ICD-10 browser searches a local DB of all current codes - no external API call, full dataset preloaded

Editable Sections & Validation

All packet sections are editable textareas - auto-saved to DB every 30s with "Saved Xs ago" timestamp
Placeholder count badge (⚠ 1 placeholder) tracks how many [BRACKETED] items remain unfilled before signing is allowed
Tip bar reminds clinicians to replace placeholders - sign button stays disabled until all required placeholders are resolved

AI Insurance Packet - Review & Sign

A 3-step wizard (Required Info → Review Sections → Sign & Submit) walks the clinician through completing the packet. The AI has already written the Progress Since Last Authorization and Medical Necessity Statement - the clinician fills in missing profile fields, reviews, and signs.

3-Step Wizard

Step 1 - Required Info: checklist of missing profile fields (NPI, Practice Name, Address, Insurance Link) with deep-links to fix them
Step 2 - Review Sections: full packet content with editable textareas - Progress Since Auth, Medical Necessity, Diagnoses, Treatment Goals
Step 3 - Sign & Submit: checkbox confirmation + "I confirm this is accurate" gate before Preview PDF or Sign & Submit

AI-Generated Content Quality

"Progress Since Last Authorization" written by Gemini citing session-by-session arc, Congruence Index trends, and specific clinical observations
"Medical Necessity Statement" cites specific symptoms, functional impairments, and clinical justification for continued treatment - all sourced from session data
"Based on 6 sessions" - packet scope dynamically set to the number of sessions analyzed since last authorization date

Security & Compliance Design

Built for real clinical environments

HIPAA-aligned RLS on every table

Clinic-scoped data isolation

SHA-256 token hashing for client links

Audit logs + admin oversight portal

Active-status enforcement across all operations

Service-role-only unauthenticated access patterns

Technical Highlights

Architected full-stack clinical SaaS with 3-tier RBAC, invite-only onboarding, and HIPAA-aligned RLS across 40+ Supabase tables with clinic-scoped data isolation

Engineered 5-step gated patient workspace and AI session analysis pipeline (DeepFace emotion timelines + Gemini 2.5 Flash) producing clinical summaries, risk flags, and therapeutic recommendations

Shipped 11 Deno Edge Functions covering AI insurance packet generation (CMS-1500), Stripe Connect multi-tenant billing with commission splits, calendar booking with double-booking prevention, and multi-step client forms with SHA-256 token security

Built super-admin portal with audit logs, bulk onboarding (50 users), role/status management, and usage analytics across all clinics