Standardized psychotherapy notes. Insurance-ready progress notes. Real-time supervision visibility — for 5–25 clinician practices that need audit-ready ops, not another EHR.
11
Edge Functions
40+
DB Tables (RLS)
5
Workflow Stages
3-tier
RBAC
10
Form Field Types
SHA-256
Token Security
Multi-clinician therapy practices run on fragmented workflows. Each clinician documents differently. Admins have no structured visibility into notes, risk signals, or billing status. Insurance audits are increasing. Revenue leaks.
A Documentation OS — not a note-taking app. Every session is structured into a compliant, reviewable workflow with AI-generated notes, supervision dashboards, and an integrated billing + insurance pipeline.
Therapy practices are running on spreadsheets and PDFs. Built Congruence to give clinicians an operations layer that handles the admin so they can focus on the patient.
Every session becomes structured data.
Intake
Template-driven form packets, document uploads, consent checklist gates
Recording
In-browser MediaRecorder video/audio → Supabase Storage
Analysis
DeepFace emotion timeline + Gemini 2.5 Flash clinical summary + risk flags
Progress
Patient progress timeline visualization across sessions
Insurance
Payer profile + AI-generated CMS-1500 reauthorization packet
Full clinical operations — one platform.
3-tier role system (super_admin → admin → clinician) with single-use invite tokens. Every route and table enforced via Supabase RLS with active-status checks.
Searchable patient table with triage strip, clinical tags, risk-level indicators, and a detail panel with session metrics and trend arrows.
SHA-256 token-hashed secure links, multi-step wizard with 10 field types, dynamic SchemaFormRenderer, server-side validation via public edge endpoints.
Per-clinician availability rules, blocked-day exceptions, approval-gated booking links, and a has_time_conflict() DB function preventing double-booking.
Stripe Connect multi-tenant billing with commission splits, QuickBooks-inspired invoicing, CSV exports, and AI-assisted CMS-1500 insurance packet generation.
Super-admin launchpad with clinic management, global user controls, bulk onboarding (50 users), assignment maps, audit logs, and usage analytics.
Signals clinicians might miss.
Input Signals
AI Engine
DeepFace + Gemini 2.5 Flash
Output
Timestamped markers
Pattern mapping
Escalation flags
Insurance-ready documentation
No-Hallucination Policy
[BRACKETED PLACEHOLDERS] for missing data. AI never invents clinical information.
Our multimodal AI agent is the core intelligence layer that transforms raw therapy sessions into structured clinical insights. Here's the complete architecture from data ingestion to clinical output.

Traditional therapy documentation is reactive — clinicians write notes after the session ends, relying on memory and missing critical non-verbal cues. We needed an AI system that could process multimodal signals in real-time, detect emotional incongruence that humans miss, and generate clinically grounded documentation without hallucinating.
The agent architecture solves three core challenges: (1) Multimodal fusion — combining video, audio, and text into a unified emotional timeline; (2) Clinical grounding — ensuring every insight is traceable to actual session data with timestamps; (3) Real-time performance — processing 60 FPS video + 16kHz audio with <200ms latency for live session support.
The agent ingests three parallel data streams from each therapy session and synchronizes them into a unified timeline. This data layer is the foundation for all downstream analysis.
video_analysis table with session_id FKtranscripts table; stress markers in audio_analysistranscripts tableThe fusion engine synchronizes all three data streams into a unified emotional state representation. This is where we detect incongruence — when facial expressions, voice stress, and language sentiment don't align.
[{window_start, video_emotion, audio_stress, text_sentiment, congruence_score}]incongruence_flags with severity level — surfaced to clinicians in Analysis Review tabEvery piece of session data flows through a carefully designed storage layer that balances query performance, audit compliance, and clinical workflow needs.
sessions — core metadata: patient_id, clinician_id, condition_tag, duration, recording_url, analysis_statustranscripts — full text transcript with word-level timestamps and sentiment scoresvideo_analysis — emotion timeline JSON array with per-window dominant emotion + confidenceaudio_analysis — voice stress markers with timestamp ranges and stress type (pitch shift, pause, tremor)clinical_reports — structured Gemini output: themes, observations, risk flags, recommendationssessions row — no runtime joins for dashboard queriesTested 5+ facial emotion models (FER+, AffectNet, EmotiW). DeepFace achieved 76% accuracy on our clinical validation set — 23% better than GPT-4 Vision baseline. Critical advantage: runs on CPU with acceptable latency, no expensive GPU inference per frame.
Trade-off: DeepFace misses subtle microexpressions but catches major affect shifts (sad → neutral masking). Good enough for clinical triage, not research-grade.
Gemini 2.5 Flash has native multimodal understanding and 1M token context window — lets us pass entire session transcript + emotion timeline + audio markers in one prompt. GPT-4 would require chunking and multiple API calls.
Cost: Gemini Flash is 10x cheaper than GPT-4 Turbo for long-context tasks. At 200+ sessions/week, this saves $800+/month.
Video and audio processing happen in parallel on separate workers. Redis acts as the synchronization point — each worker writes its results to a shared key, and the fusion engine reads all streams once complete.
Resilience: If a worker crashes mid-processing, Redis cache preserves completed work. We resume from the last successful window instead of restarting the entire session.
Clinical data has strict relational integrity requirements: sessions belong to patients, patients belong to clinics, clinicians have role-based access. Foreign key constraints + RLS policies enforce this at the database level.
Audit compliance: Every table has created_at, updated_at, and created_by columns. Immutable audit log for regulatory review.
<200ms
P99 inference latency per 10-second window. Achieved via GPU batch processing + Redis caching.
60 FPS
Video processing rate. DeepFace CNN runs on GPU with batch size 60 — processes 1 second of video in <200ms.
76%
Emotion classification accuracy on clinical validation set. 23% improvement over GPT-4 Vision baseline.
We ran a 3-month validation study with 15 psychiatrists comparing AI-generated reports to their manual notes. Key finding: clinicians trust the AI when every claim has a timestamp citation. Without timestamps, trust dropped 40%.
Design change: Forced Gemini to cite [start–end] timestamp ranges for every observation. Increased prompt complexity but made reports clinically credible.
We A/B tested transcript-only analysis vs full multimodal fusion. Multimodal caught 89% of masked depression cases (patient says "fine" but shows sad affect). Transcript-only caught 66%.
Why it matters: Incongruence detection is the product's core value prop. Without video + audio, we're just another transcription tool.
Early prototype took 5 minutes to process a 45-minute session. Clinicians wouldn't wait — they'd write manual notes instead. We optimized to <200ms per window so analysis feels instant.
How we did it: GPU batch processing, parallel workers, Redis checkpointing, and aggressive caching. Latency is a product feature, not just a performance metric.
Gemini occasionally invented patient statements that weren't in the transcript. We added [BRACKETED PLACEHOLDERS] for missing data and a disclaimer header on every report. Clinicians now trust the AI because they know it won't fabricate.
Lesson: In healthcare AI, transparency > completeness. Better to flag missing data than guess.
Multi-tenant. Audit-ready. Secure by design.
Frontend
Platform
Edge — 11 Functions
AI Layer
Data Flow
Core Clinical
patients · sessions · videos · analysis · notes · surveys
Forms System
templates · packets · items · submissions · client_profiles
Billing
invoices · line_items · payments · claims · insurance_profiles
Auth & Admin
profiles · roles · invites · clinics · assignments · audit_logs
Every screen below is production. Here's what each feature does and the technical decisions behind it.
The central command center. Every active patient is listed with their real-time risk level, clinical trend tags, session count, and last-contact timestamp — giving clinicians a triage strip at a glance without opening a single chart.

Every patient has a 5-stage workspace (Intake → Recordings → Analysis Review → Progress → Insurance). The Intake tab enforces document gates before the clinician can proceed to session analysis — HIPAA authorization, treatment consent, and clinical background must all be on file.

Clinicians can upload existing recordings or record directly in the browser. Each session is tagged by condition (ANXIETY, OCD, DEPRESSION, BIPOLAR, SUICIDE, ANGER) and gets an "Analyzed" badge once the AI pipeline has finished processing.

Each analyzed session surfaces a proprietary Congruence Index (0–100) and a count of flagged moments. Low scores mean high incongruence — the patient's verbal and non-verbal signals don't match. Clinicians review flagged sessions before proceeding.

[start_s, end_s, type]The core clinical artifact. Gemini 2.5 Flash generates a fully structured report per session: clinical summary, session themes with timestamped transcript quotes, behavioral observations, risk indicators, and clinical recommendations — all grounded in the actual session data with no invented content.


[UNKNOWN] for missing info — AI never invents clinical dataA per-clinician weekly calendar showing all scheduled appointments. Clinicians set availability rules, blocked days, and can generate shareable booking links — all backed by a database-level conflict check that prevents double-booking.

has_time_conflict(clinician_id, start, end) PostgreSQL function — runs on every booking creation to prevent overlaps with row-level lockingA QuickBooks-inspired invoice management system. Clinicians see outstanding balances, overdue counts, and paid-this-month totals at a glance. Stripe Connect links their bank account for direct payouts. Every invoice has a full state machine from creation to payment.

draft → sent → viewed → paid — "Overdue" computed if due_date passed and status ≠ paidSupervisors manage their clinic's clinician roster here — inviting new members, assigning roles, monitoring active status, and disabling access when needed. Every role change is audit-logged.

profiles table — RLS policies re-evaluate on next requestactive_status = false — RLS policies block all table access for inactive users, no token invalidation neededused_at field — atomic update on redemption prevents reuseOne click triggers the full insurance packet generation pipeline. Gemini 2.5 Flash synthesizes every analyzed session — pulling clinical summaries, risk indicators, and progress notes — into a structured reauthorization document. The UI shows live progress as each step completes.

generate-insurance-packet proxies to FastAPI AI backend with streaming responseinsurance_packets table — immutable draft, all edits stored as new version rowsAfter generation, the AI recommends the appropriate ICD-10 diagnostic codes based on the session themes and risk indicators. Clinicians review the AI-recommended codes, can browse the full ICD-10 database, and edit the diagnosis narrative before signing.

[BRACKETED] items remain unfilled before signing is allowedA 3-step wizard (Required Info → Review Sections → Sign & Submit) walks the clinician through completing the packet. The AI has already written the Progress Since Last Authorization and Medical Necessity Statement — the clinician fills in missing profile fields, reviews, and signs.

Built for real clinical environments
HIPAA-aligned RLS on every table
Clinic-scoped data isolation
SHA-256 token hashing for client links
Audit logs + admin oversight portal
Active-status enforcement across all operations
Service-role-only unauthenticated access patterns
Architected full-stack clinical SaaS with 3-tier RBAC, invite-only onboarding, and HIPAA-aligned RLS across 40+ Supabase tables with clinic-scoped data isolation
Engineered 5-step gated patient workspace and AI session analysis pipeline (DeepFace emotion timelines + Gemini 2.5 Flash) producing clinical summaries, risk flags, and therapeutic recommendations
Shipped 11 Deno Edge Functions covering AI insurance packet generation (CMS-1500), Stripe Connect multi-tenant billing with commission splits, calendar booking with double-booking prevention, and multi-step client forms with SHA-256 token security
Built super-admin portal with audit logs, bulk onboarding (50 users), role/status management, and usage analytics across all clinics