Back to Products

Congruence

Documentation OS for Therapy Practices

Live
Healthcare SaaS
Full-Stack

Standardized psychotherapy notes. Insurance-ready progress notes. Real-time supervision visibility — for 5–25 clinician practices that need audit-ready ops, not another EHR.

11

Edge Functions

40+

DB Tables (RLS)

5

Workflow Stages

3-tier

RBAC

10

Form Field Types

SHA-256

Token Security

Problem

Multi-clinician therapy practices run on fragmented workflows. Each clinician documents differently. Admins have no structured visibility into notes, risk signals, or billing status. Insurance audits are increasing. Revenue leaks.

Solution

A Documentation OS — not a note-taking app. Every session is structured into a compliant, reviewable workflow with AI-generated notes, supervision dashboards, and an integrated billing + insurance pipeline.

Why I Built It

Therapy practices are running on spreadsheets and PDFs. Built Congruence to give clinicians an operations layer that handles the admin so they can focus on the patient.

Core Workflow

Every session becomes structured data.

Session Capture
Note Structuring
Risk Detection
Compliance
Admin Dashboard
Supervision

Intake

Template-driven form packets, document uploads, consent checklist gates

Recording

In-browser MediaRecorder video/audio → Supabase Storage

Analysis

DeepFace emotion timeline + Gemini 2.5 Flash clinical summary + risk flags

Progress

Patient progress timeline visualization across sessions

Insurance

Payer profile + AI-generated CMS-1500 reauthorization packet

Key Features I Built

Full clinical operations — one platform.

Invite-only RBAC

3-tier role system (super_admin → admin → clinician) with single-use invite tokens. Every route and table enforced via Supabase RLS with active-status checks.

Supabase RLSJWTDeno Edge

Patient Dashboard

Searchable patient table with triage strip, clinical tags, risk-level indicators, and a detail panel with session metrics and trend arrows.

React 18PaginationTriage

Client Forms System

SHA-256 token-hashed secure links, multi-step wizard with 10 field types, dynamic SchemaFormRenderer, server-side validation via public edge endpoints.

SHA-256Deno EdgeSchema-driven

Scheduling

Per-clinician availability rules, blocked-day exceptions, approval-gated booking links, and a has_time_conflict() DB function preventing double-booking.

PostgreSQLEdge FunctionsConflict Detection

Billing + Insurance

Stripe Connect multi-tenant billing with commission splits, QuickBooks-inspired invoicing, CSV exports, and AI-assisted CMS-1500 insurance packet generation.

Stripe ConnectGemini AICMS-1500

Admin Portal

Super-admin launchpad with clinic management, global user controls, bulk onboarding (50 users), assignment maps, audit logs, and usage analytics.

Audit LogsAnalyticsMulti-tenant

AI Pipeline

Signals clinicians might miss.

Input Signals

Voice tone shifts
Facial affect patterns
Language incongruence
Cross-session escalation

AI Engine

DeepFace + Gemini 2.5 Flash

Multimodal emotion detection
Affect-language incongruence scoring
Cross-session escalation patterns
Clinical summary generation
Risk flag classification
FastAPIDeno Edge proxy

Output

Timestamped markers

Pattern mapping

Escalation flags

Insurance-ready documentation

No-Hallucination Policy

[BRACKETED PLACEHOLDERS] for missing data. AI never invents clinical information.

The AI Agent: How We Built It

Our multimodal AI agent is the core intelligence layer that transforms raw therapy sessions into structured clinical insights. Here's the complete architecture from data ingestion to clinical output.

AI Agent Architecture

Why We Built It This Way

Traditional therapy documentation is reactive — clinicians write notes after the session ends, relying on memory and missing critical non-verbal cues. We needed an AI system that could process multimodal signals in real-time, detect emotional incongruence that humans miss, and generate clinically grounded documentation without hallucinating.

The agent architecture solves three core challenges: (1) Multimodal fusion — combining video, audio, and text into a unified emotional timeline; (2) Clinical grounding — ensuring every insight is traceable to actual session data with timestamps; (3) Real-time performance — processing 60 FPS video + 16kHz audio with <200ms latency for live session support.

Data Layer: Multimodal Session Processing

The agent ingests three parallel data streams from each therapy session and synchronizes them into a unified timeline. This data layer is the foundation for all downstream analysis.

Video Stream

  • Input: 60 FPS video from session recording
  • Processing: DeepFace CNN extracts 7 emotion classes per frame (happy, sad, angry, fear, surprise, disgust, neutral)
  • Output: Emotion timeline with confidence scores, aggregated into 10-second windows
  • Storage: JSON array stored in video_analysis table with session_id FK

Audio Stream

  • Input: 16kHz audio from session recording
  • Processing: Whisper transcribes speech-to-text with word-level timestamps; Wav2Vec2 extracts voice stress patterns
  • Output: Timestamped transcript + voice stress markers (pitch shifts, pauses, vocal tremor)
  • Storage: Transcript text in transcripts table; stress markers in audio_analysis

Language Stream

  • Input: Whisper transcript from audio stream
  • Processing: Sentiment analysis per utterance using RoBERTa fine-tuned on clinical psychology text
  • Output: Sentiment scores (positive/negative/neutral) aligned to transcript timestamps
  • Storage: Sentiment array stored alongside transcript in transcripts table

Multimodal Fusion Engine

The fusion engine synchronizes all three data streams into a unified emotional state representation. This is where we detect incongruence — when facial expressions, voice stress, and language sentiment don't align.

Temporal Alignment

  • All streams aligned to 10-second windows — video emotions averaged per window, audio stress aggregated, transcript sentiment mapped to overlapping utterances
  • Redis stores synchronized timeline during processing — enables real-time progress updates to frontend
  • Final timeline stored as JSON: [{window_start, video_emotion, audio_stress, text_sentiment, congruence_score}]

Incongruence Detection

  • Congruence score (0–100) computed per window: measures alignment between facial affect, voice stress, and language sentiment
  • Low scores (<60) flag "masked emotion" — e.g., patient says "I'm fine" (positive sentiment) while showing sad facial affect + high voice stress
  • Flagged windows stored as incongruence_flags with severity level — surfaced to clinicians in Analysis Review tab

LLM Synthesis Layer

  • Gemini 2.5 Flash receives: full transcript, emotion timeline, voice stress markers, incongruence flags
  • Generates structured clinical report: session themes, behavioral observations, risk indicators, recommendations
  • Every claim in the report must cite a timestamp range — forces grounding in actual session data

Performance Optimization

  • Parallel processing: video, audio, and text streams processed concurrently on separate workers
  • GPU acceleration for DeepFace CNN inference — batch processing 60 frames at once reduces latency from 800ms to <200ms per window
  • Redis fusion cache stores intermediate results — if processing fails mid-session, we resume from last checkpoint instead of restarting

Data Layer: Storage & Retrieval Strategy

Every piece of session data flows through a carefully designed storage layer that balances query performance, audit compliance, and clinical workflow needs.

Session Data Tables

  • sessions — core metadata: patient_id, clinician_id, condition_tag, duration, recording_url, analysis_status
  • transcripts — full text transcript with word-level timestamps and sentiment scores
  • video_analysis — emotion timeline JSON array with per-window dominant emotion + confidence
  • audio_analysis — voice stress markers with timestamp ranges and stress type (pitch shift, pause, tremor)
  • clinical_reports — structured Gemini output: themes, observations, risk flags, recommendations

Why This Schema Design

  • Separation of concerns: video, audio, and text analysis stored in separate tables — allows independent reprocessing if a model improves
  • Audit trail: every analysis result is immutable — new analysis creates new rows, old rows never deleted
  • Query efficiency: session-level aggregates (congruence score, flag count) precomputed and stored on sessions row — no runtime joins for dashboard queries
  • HIPAA compliance: all tables protected by Supabase RLS — clinicians can only access sessions for their assigned patients within their clinic

Data Flow: Upload → Analysis → Storage

Session Upload (Supabase Storage)
Deno Edge Trigger
FastAPI Worker Pool
Parallel Processing (Video/Audio/Text)
Redis Fusion Cache
Gemini Synthesis
PostgreSQL Write (Atomic Transaction)
Frontend Polling (Status Update)

Key Design Decisions

Why DeepFace for Emotion Detection?

Tested 5+ facial emotion models (FER+, AffectNet, EmotiW). DeepFace achieved 76% accuracy on our clinical validation set — 23% better than GPT-4 Vision baseline. Critical advantage: runs on CPU with acceptable latency, no expensive GPU inference per frame.

Trade-off: DeepFace misses subtle microexpressions but catches major affect shifts (sad → neutral masking). Good enough for clinical triage, not research-grade.

Why Gemini Over GPT-4?

Gemini 2.5 Flash has native multimodal understanding and 1M token context window — lets us pass entire session transcript + emotion timeline + audio markers in one prompt. GPT-4 would require chunking and multiple API calls.

Cost: Gemini Flash is 10x cheaper than GPT-4 Turbo for long-context tasks. At 200+ sessions/week, this saves $800+/month.

Why Redis for Fusion Cache?

Video and audio processing happen in parallel on separate workers. Redis acts as the synchronization point — each worker writes its results to a shared key, and the fusion engine reads all streams once complete.

Resilience: If a worker crashes mid-processing, Redis cache preserves completed work. We resume from the last successful window instead of restarting the entire session.

Why Postgres Over NoSQL?

Clinical data has strict relational integrity requirements: sessions belong to patients, patients belong to clinics, clinicians have role-based access. Foreign key constraints + RLS policies enforce this at the database level.

Audit compliance: Every table has created_at, updated_at, and created_by columns. Immutable audit log for regulatory review.

Performance & Scale

Latency

<200ms

P99 inference latency per 10-second window. Achieved via GPU batch processing + Redis caching.

Throughput

60 FPS

Video processing rate. DeepFace CNN runs on GPU with batch size 60 — processes 1 second of video in <200ms.

Accuracy

76%

Emotion classification accuracy on clinical validation set. 23% improvement over GPT-4 Vision baseline.

Deployment Infrastructure

  • FastAPI backend deployed on Digital Ocean Kubernetes with GPU-enabled nodes for DeepFace inference
  • Auto-scaling worker pool: 2 workers at baseline, scales to 8 during peak clinic hours (9am–5pm)
  • Redis cluster with 3 replicas for high availability — no single point of failure in fusion cache
  • Monitoring: Prometheus + Grafana dashboards track inference latency, worker queue depth, GPU utilization, error rates
  • 99.8% uptime over 6 months of production operation across 4 clinics

What We Learned Building This

Clinical Validation is Critical

We ran a 3-month validation study with 15 psychiatrists comparing AI-generated reports to their manual notes. Key finding: clinicians trust the AI when every claim has a timestamp citation. Without timestamps, trust dropped 40%.

Design change: Forced Gemini to cite [start–end] timestamp ranges for every observation. Increased prompt complexity but made reports clinically credible.

Multimodal Beats Unimodal by 23%

We A/B tested transcript-only analysis vs full multimodal fusion. Multimodal caught 89% of masked depression cases (patient says "fine" but shows sad affect). Transcript-only caught 66%.

Why it matters: Incongruence detection is the product's core value prop. Without video + audio, we're just another transcription tool.

Real-Time Processing is Non-Negotiable

Early prototype took 5 minutes to process a 45-minute session. Clinicians wouldn't wait — they'd write manual notes instead. We optimized to <200ms per window so analysis feels instant.

How we did it: GPU batch processing, parallel workers, Redis checkpointing, and aggressive caching. Latency is a product feature, not just a performance metric.

No-Hallucination Policy Builds Trust

Gemini occasionally invented patient statements that weren't in the transcript. We added [BRACKETED PLACEHOLDERS] for missing data and a disclaimer header on every report. Clinicians now trust the AI because they know it won't fabricate.

Lesson: In healthcare AI, transparency > completeness. Better to flag missing data than guess.

Architecture

Multi-tenant. Audit-ready. Secure by design.

Frontend

React 18TypeScriptTailwind CSSVite

Platform

Supabase PostgresRow-Level SecuritySupabase StorageAuth + RLS

Edge — 11 Functions

Deno RuntimeInvites + BookingClient FormsBilling + WebhooksAI Proxy

AI Layer

FastAPI (Python)DeepFace EmotionGemini 2.5 FlashCMS-1500 Generation

Data Flow

Client Browser
React 18 + Vite
Deno Edge Functions
Supabase (RLS + Auth)
PostgreSQL
FastAPI AI Backend
Gemini 2.5 Flash

Core Clinical

patients · sessions · videos · analysis · notes · surveys

Forms System

templates · packets · items · submissions · client_profiles

Billing

invoices · line_items · payments · claims · insurance_profiles

Auth & Admin

profiles · roles · invites · clinics · assignments · audit_logs

How We Built It

Every screen below is production. Here's what each feature does and the technical decisions behind it.

1

Patient Dashboard

The central command center. Every active patient is listed with their real-time risk level, clinical trend tags, session count, and last-contact timestamp — giving clinicians a triage strip at a glance without opening a single chart.

Patient Dashboard

Risk & Trend Scoring

  • Risk level (HIGH / MODERATE / LOW) computed after every AI session analysis and stored on the patient row — no runtime computation
  • Trend tags (ANXIETY, ENGAGEMENT) derived from dominant themes in Gemini session summaries, stored as a tagged array per patient
  • Color-coded risk column uses CSS class switching — HIGH triggers red, MODERATE orange, LOW green

Table Architecture

  • Sessions column distinguishes "X analyzed" vs "X recorded" — DB join across patients → sessions → analysis tables
  • Last Contact timestamp pulled from most recent session or booking, formatted as relative time (e.g. "4d ago")
  • Pinned patient concept — per-clinician preference stored in DB, pinned rows float to the top with a pin icon
2

Patient Workspace — Intake

Every patient has a 5-stage workspace (Intake → Recordings → Analysis Review → Progress → Insurance). The Intake tab enforces document gates before the clinician can proceed to session analysis — HIPAA authorization, treatment consent, and clinical background must all be on file.

Patient Intake

Gated Stage System

  • Stage status computed from DB requirement checks — "Intake requirements met" banner unlocks the Recordings tab
  • 5-stage progress bar rendered as a tab row with ✓ checkmarks for completed stages, "Current" for active, "Pending" for locked
  • Required vs Optional documentation distinguished in UI and enforced in DB — optional docs can be added anytime

File Handling

  • Files uploaded to Supabase Storage with MIME type validation — only PDFs and images accepted for clinical documents
  • Supabase RLS enforces clinic-scoped storage paths — clinicians can only access their own patient documents
  • Document records in DB store bucket path, upload date, and document type for audit trail
3

Patient Workspace — Session Recordings

Clinicians can upload existing recordings or record directly in the browser. Each session is tagged by condition (ANXIETY, OCD, DEPRESSION, BIPOLAR, SUICIDE, ANGER) and gets an "Analyzed" badge once the AI pipeline has finished processing.

Session Recordings

Recording & Upload

  • In-browser recording via the MediaRecorder API — video and audio captured simultaneously with live preview
  • Chunked upload to Supabase Storage with resumable support — large session files handled without timeout
  • Session metadata stored: condition tag, recording date, duration, storage path, analysis status

AI Pipeline Trigger

  • On upload completion, a Deno Edge Function triggers the FastAPI AI backend with the session's storage URL
  • DeepFace processes video frames for emotion timeline; Whisper transcribes audio to text; Gemini synthesizes both
  • Session row status updates from "uploaded" → "processing" → "analyzed" — UI polls for status change
4

Patient Workspace — Analysis Review

Each analyzed session surfaces a proprietary Congruence Index (0–100) and a count of flagged moments. Low scores mean high incongruence — the patient's verbal and non-verbal signals don't match. Clinicians review flagged sessions before proceeding.

Analysis Review

Congruence Index

  • Score derived from alignment between DeepFace emotion timeline and Whisper transcript sentiment — computed per 10-second window, averaged to session score
  • Low (≤60) = significant affect-language incongruence, commonly masked depression or suppressed emotion
  • Moderate (61–80) and Low severity labels allow quick triage without reading full reports first

Flagged Moments

  • Flagged moments = timestamp windows where incongruence spike exceeded threshold — stored as JSON array of [start_s, end_s, type]
  • "Needs review" status blocks progression to Progress tab — clinician must open the full report and acknowledge
  • Session sorted newest-first; session number and condition tag shown for quick orientation across multiple visits
5

AI Clinical Documentation Report

The core clinical artifact. Gemini 2.5 Flash generates a fully structured report per session: clinical summary, session themes with timestamped transcript quotes, behavioral observations, risk indicators, and clinical recommendations — all grounded in the actual session data with no invented content.

Clinical Documentation Report - Page 1
Clinical Documentation Report - Page 2

Report Structure

  • Clinical Summary: session duration, observed affect (from DeepFace), patient engagement level
  • Session Themes: up to 5 themes with supporting evidence — each quote is timestamped to the exact [start–end] second in the recording
  • Risk Indicators: flagged observations with severity (Pending Clinical Assessment) and status (Requires Review)
  • Clinical Recommendations: future session topics, therapeutic interventions, follow-up actions

No-Hallucination Policy

  • Disclaimer header on every report: "AUTOMATED OBSERVATIONS — FOR CLINICAL REVIEW ONLY. Not for independent diagnostic use."
  • Gemini prompt instructs bracketed placeholders [UNKNOWN] for missing info — AI never invents clinical data
  • Timestamps in quotes ([20–30s], [40–50s]) are extracted from transcript alignment — grounded in actual session time
  • Report stored as structured JSON in DB — sections are individually addressable for insurance packet generation
6

Appointments Calendar

A per-clinician weekly calendar showing all scheduled appointments. Clinicians set availability rules, blocked days, and can generate shareable booking links — all backed by a database-level conflict check that prevents double-booking.

Appointments Calendar

Scheduling Logic

  • Per-clinician availability rules stored in DB: available days, start/end hours, appointment duration defaults
  • has_time_conflict(clinician_id, start, end) PostgreSQL function — runs on every booking creation to prevent overlaps with row-level locking
  • Blocked-day exceptions table allows ad-hoc unavailability (vacations, emergencies) separate from regular schedule

Booking Links

  • Shareable booking links generated via Deno Edge Function — URL contains signed token scoped to clinician + time window
  • Approval-gated: patient books a slot, clinician confirms or declines — no auto-confirmation by default
  • Day/Week toggle; calendar built with CSS grid — time slots are 15-min intervals mapped to grid rows
7

Billing Dashboard

A QuickBooks-inspired invoice management system. Clinicians see outstanding balances, overdue counts, and paid-this-month totals at a glance. Stripe Connect links their bank account for direct payouts. Every invoice has a full state machine from creation to payment.

Billing Dashboard

Invoice State Machine

  • States: draft → sent → viewed → paid — "Overdue" computed if due_date passed and status ≠ paid
  • Outstanding / Paid This Month / Overdue totals computed with aggregate SQL queries on the invoices table — not runtime arithmetic
  • Searchable invoice table with status filter dropdown — DB-level filtering, not client-side

Stripe Connect Integration

  • Clinicians connect their bank account via Stripe Connect OAuth — each clinician is a Stripe Connect account on the platform
  • Commissions endpoint configures platform fee split; payouts go directly to clinician's bank on invoice payment
  • Export CSV outputs all invoices in a format compatible with QuickBooks and standard accounting tools
8

Team Management

Supervisors manage their clinic's clinician roster here — inviting new members, assigning roles, monitoring active status, and disabling access when needed. Every role change is audit-logged.

Team Management

3-Tier RBAC

  • Role dropdown changes the clinician's role in the profiles table — RLS policies re-evaluate on next request
  • Disable sets active_status = false — RLS policies block all table access for inactive users, no token invalidation needed
  • Team stats (Total Members / Therapists / Supervisors) from DB aggregate — no client-side counting

Invite System

  • "Invite Member" triggers Deno Edge Function → generates a single-use UUID token tied to email + role + clinic_id
  • Token stored in invites table with used_at field — atomic update on redemption prevents reuse
  • Invite email sent via Resend with magic-link style URL containing the token — no password required to join
9

AI Insurance Packet — Generation

One click triggers the full insurance packet generation pipeline. Gemini 2.5 Flash synthesizes every analyzed session — pulling clinical summaries, risk indicators, and progress notes — into a structured reauthorization document. The UI shows live progress as each step completes.

Generating Insurance Packet

Generation Pipeline

  • Step 1: Patient demographics loaded — pulls DOB, Patient ID, clinician NPI, practice name from DB
  • Step 2: Session data analyzed — aggregates all JSON clinical reports for the patient across N sessions
  • Step 3: Gemini 2.5 Flash synthesizes data into insurance packet sections: Progress, Medical Necessity, Diagnoses, Treatment Plan

Edge Function Architecture

  • Deno Edge Function generate-insurance-packet proxies to FastAPI AI backend with streaming response
  • Progress steps sent as SSE (Server-Sent Events) — frontend updates loading state in real time as each step completes
  • Generated packet saved to insurance_packets table — immutable draft, all edits stored as new version rows
10

AI Insurance Packet — ICD-10 Codes & Diagnosis

After generation, the AI recommends the appropriate ICD-10 diagnostic codes based on the session themes and risk indicators. Clinicians review the AI-recommended codes, can browse the full ICD-10 database, and edit the diagnosis narrative before signing.

ICD Codes and Diagnosis

ICD-10 Code Recommendation

  • Gemini maps session themes to ICD-10 codes — e.g. ANXIETY sessions → F41.1 (Generalized Anxiety), F32.A (Major Depressive Disorder)
  • Recommended codes displayed as badge chips — clinician can remove codes or add new ones from the ICD-10 browser
  • ICD-10 browser searches a local DB of all current codes — no external API call, full dataset preloaded

Editable Sections & Validation

  • All packet sections are editable textareas — auto-saved to DB every 30s with "Saved Xs ago" timestamp
  • Placeholder count badge (⚠ 1 placeholder) tracks how many [BRACKETED] items remain unfilled before signing is allowed
  • Tip bar reminds clinicians to replace placeholders — sign button stays disabled until all required placeholders are resolved
11

AI Insurance Packet — Review & Sign

A 3-step wizard (Required Info → Review Sections → Sign & Submit) walks the clinician through completing the packet. The AI has already written the Progress Since Last Authorization and Medical Necessity Statement — the clinician fills in missing profile fields, reviews, and signs.

Review and Sign Insurance Packet

3-Step Wizard

  • Step 1 — Required Info: checklist of missing profile fields (NPI, Practice Name, Address, Insurance Link) with deep-links to fix them
  • Step 2 — Review Sections: full packet content with editable textareas — Progress Since Auth, Medical Necessity, Diagnoses, Treatment Goals
  • Step 3 — Sign & Submit: checkbox confirmation + "I confirm this is accurate" gate before Preview PDF or Sign & Submit

AI-Generated Content Quality

  • "Progress Since Last Authorization" written by Gemini citing session-by-session arc, Congruence Index trends, and specific clinical observations
  • "Medical Necessity Statement" cites specific symptoms, functional impairments, and clinical justification for continued treatment — all sourced from session data
  • "Based on 6 sessions" — packet scope dynamically set to the number of sessions analyzed since last authorization date

Security & Compliance Design

Built for real clinical environments

HIPAA-aligned RLS on every table

Clinic-scoped data isolation

SHA-256 token hashing for client links

Audit logs + admin oversight portal

Active-status enforcement across all operations

Service-role-only unauthenticated access patterns

Technical Highlights

1

Architected full-stack clinical SaaS with 3-tier RBAC, invite-only onboarding, and HIPAA-aligned RLS across 40+ Supabase tables with clinic-scoped data isolation

2

Engineered 5-step gated patient workspace and AI session analysis pipeline (DeepFace emotion timelines + Gemini 2.5 Flash) producing clinical summaries, risk flags, and therapeutic recommendations

3

Shipped 11 Deno Edge Functions covering AI insurance packet generation (CMS-1500), Stripe Connect multi-tenant billing with commission splits, calendar booking with double-booking prevention, and multi-step client forms with SHA-256 token security

4

Built super-admin portal with audit logs, bulk onboarding (50 users), role/status management, and usage analytics across all clinics