Insurance Copilot, AI Architecture Overview

Solution Area	AI Intensity	What AI Does	Deterministic Control	Configurable by Client	AI Fallback
AI-Powered Experience Layer, AI drives the primary member interaction
💬Natural Language Understanding Intent & Slot Extraction	High	LLM classifies every member message into 28+ intent categories in real time, resolving pronouns, contextual follow-ups ("what about that one?"), paraphrasing, and bilingual Arabic/English in a single call. gpt-5-nano (intent) Bilingual EN/AR 28 intents Slot extraction Extracts structured slots (claim IDs, preauth IDs, benefit codes, city, provider type, in-network flag) alongside the intent , turning a plain sentence into a queryable API call.	GUARD: Post-LLM corrections Deterministic guards override systematic LLM mistakes, e.g. "pending claim" language always routes to CLAIM_LIST, specific preauth IDs always map to PREAUTH_STATUS regardless of LLM output. FALLBACK: Regex engine Keyword-pattern regex classifier across 8 intent families always available. If LLM times out or is disabled, deterministic classification runs transparently, members never see a failure.	AI mode: Deterministic Only / Hybrid Azure / Hybrid Local Provider: OpenAI, Azure OpenAI, local Ollama, or managed Model: Any OpenAI-compatible model via DB config	Regex engine
✍️Response Generation Naturalization & Voice	High	Every structured answer from the deterministic layer passes through LLM naturalization, transforming clinical facts into warm, human, conversational prose that sounds like a knowledgeable insurance adviser, not a form. Tone adaptation Empathy calibration EN/AR phrasing Contextual persona voice For GREETING, HELP, and complex multi-turn queries, the LLM generates full character responses, using the member's live claim and preauth history as context, proactively connecting what the member said to what's actually in their file.	HARD RULES: Factual preservation System prompt enforces that all benefit names, percentages, claim IDs, reference numbers, dates, and amounts must be preserved exactly. AI cannot soften coverage exclusions or invent claim reasons. CLAIM STATUS RULE If the context says "InReview", the LLM must say "under review", never "denied" or "rejected" unless the actual context contains that reason. BANNED PHRASE LIBRARY 14 banned phrases (e.g. "feel free to ask", "Good question!") are explicitly prohibited in every LLM call.	Response tone: Formal / Empathetic / Concise Length: Brief / Standard / Detailed Language complexity: Plain / Mixed / Clinical Denial empathy: Standard / High / Maximum Disclaimer level: None / Minimal / Standard / Strict	Deterministic text as-is
🎭AI Persona Engine Identity & A/B Testing	High	The AI copilot operates as a named persona with its own voice, character, and communication style, injected into every LLM call as a persona voice layer on top of the base character definition. Persona voice injection A/B test arms Live experiment support Multiple personas can run simultaneously as A/B test arms members are hashed to a stable arm for the duration of the experiment, and arm performance is tracked per persona. Clients can run persona experiments without any code changes.	PRIORITY ORDER Persona resolution is fully deterministic: A/B arm → assignment rules → tenant default → platform default. Rules fire in priority order and are inspectable at runtime. STABLE ASSIGNMENT Member-to-arm assignment uses a deterministic hash, the same member always gets the same persona arm for a given test, preventing experience thrashing. Persona voice is an instruction layer, it cannot override factual rules, banned phrases, or claim status constraints in the base system prompt.	Tenant default persona Assignment rules: segment, service type, language, urgency, channel A/B experiments: traffic split, arm labels, target segment Pause / end experiments at any time	Platform default persona
🎛Behavioral Tuning Communication Governance	Medium	An admin-configurable behavioral profile shapes how the AI speaks to every member on every turn, tone, empathy level for bad news, response length, proactive suggestions, disclaimer strictness, and how quickly the AI offers human escalation. Tone & voice control Denial empathy prefixes Waiting acknowledgement Proactive next steps The profile is passed to the LLM as behavioral instructions on each call, enabling naturally varied, contextually appropriate closings rather than repetitive canned phrases.	DENIAL DETECTION Keyword detection ("denied", "rejected", "not covered") fires deterministically, empathy prefix is prepended before LLM naturalization, regardless of AI mood. PENDING DETECTION "Pending / under review / awaiting" language triggers deterministic waiting acknowledgement prefix before LLM naturalizes. ADMIN BYPASS Behavioral profile is member/broker-facing only, admin users always receive direct, unmodified technical output, bypassing all behavioral transforms. ESCALATION GATE Escalation tendency (Low/Medium/High) deterministically adds a "Request callback" suggested action when profile + situation match, AI cannot suppress this.	Tone: Formal / Empathetic / Concise Denial empathy: Standard / High / Maximum Escalation tendency: Low / Medium / High Proactive next steps on/off Always offer case creation on/off Disclaimer level: None → Strict	Default profile
📄Document Intelligence OCR & Medical Extraction	High	AI reads and extracts structured fields from medical documents , invoices, prescriptions, lab reports, discharge summaries using an OCR pipeline followed by a medical extraction validator. OCR pipeline Handwriting detection Medical field extraction Multilingual documents A prompt template registry routes different document types to appropriate extraction tasks. Confidence scores are computed per field, and handwriting risk is assessed per document.	REVIEW DECISION ENGINE A fully deterministic gate inspects OCR confidence, handwriting risk level, missing critical fields, suspicious values, date anomalies, and amount inconsistencies before any AI-extracted field is used in an authoritative decision. 15 reason codes The gate uses 15 explicit reason codes (low_ocr_confidence, critical_handwriting_prescription, amount_inconsistency, etc.) to decide whether the result requires human clinical review blocking automation if thresholds are not met. LLM RESPONSE GUARD All LLM-extracted output passes through a response guard before leaving the pipeline.	OCR confidence threshold Review sensitivity level Handwriting risk tolerance Clinical reviewer routing	Route to human review
Intelligent Orchestration, AI guides the flow, deterministic rules govern transitions
🔄4-Phase Conversation Engine Orchestration & Flow	Medium	Every conversation flows through four AI-orchestrated phases: encounter micro-feedback (how was your last visit?), primary need resolution, extended service opportunity, and experience collection. Phase 0: Encounter feedback Phase 1: Resolve need Phase 2: Extended services Phase 3: Experience LLM evaluates Phase 1 readiness (has the member's need been fulfilled?), generates natural bridge text between phases, and directs Phase 0 conversations across 4 structured topics (health, doctor, facility, care continuity).	PHASE TRANSITION GATES Phase 2 and 3 are locked behind Phase 1 readiness. A deterministic suppression policy can block Phase 2 commerce offers entirely for a session. BYPASS RULES Urgent keywords (Arabic and English), health symptom detection, and explicit service requests immediately bypass Phase 0, deterministically, before any LLM is called. A member asking about their claim never gets stopped for encounter feedback. DRIFT DETECTION Conversation drift from insurance topics is measured by keyword presence across turns , triggering LLM re-centering if the conversation wanders.	Phase 0 enabled/disabled Phase 2 commerce gating Re-centering templates Phase eligibility thresholds	Skip to Phase 1
🤝Handoff & Escalation Intelligence Human Transfer Engine	Medium	The platform continuously monitors AI confidence across every turn and automatically escalates to a human agent when it detects that AI is not the right tool, before the member gets frustrated. AI confidence monitoring Emotional exhaustion detection Arabic + English patterns 12 signal categories Detects emotional exhaustion ("been waiting forever", "nobody helps"), formal complaints, legal threats ("lawyer", "ombudsman"), and explicit human requests, in both Arabic and English.	HARD CONFIDENCE THRESHOLD If AI confidence falls below 0.35, handoff fires immediately regardless of any other signal. No AI discretion at this point. SOFT THRESHOLD COMBO Below 0.50 confidence + 4+ unresolved turns triggers co-pilot mode. Above 10 turns total → absolute ceiling, always escalates. QUEUE ROUTING Each trigger family deterministically routes to a named queue: complaints → Complaints Queue, identity failure → Escalations, provider friction → Provider Support. Queue routing cannot be overridden by AI.	Confidence thresholds Turn count ceilings VIP / sensitive category routing QA sampling rate Escalation tendency: Low / Medium / High	Human queue
🎯Opportunity Intelligence Next-Best-Action Scoring	Medium	A 10-dimensional intelligent scorer decides when , and what, to offer a member after their need is resolved: health journey relevance, friction reduction opportunity, emotional sensitivity, domain adjacency (e.g. an MRI journey → physio adjacency), and commerce fit are all weighed. 10 scoring dimensions Domain adjacency map Emotional sensitivity Trust preservation score Predictive plugin architecture is in place to allow a machine learning model to override any dimension with learned predictions from member history, without changing the scoring framework.	TRUST FIRST Trust preservation score (weighted 0.25, the highest single weight) always dominates. If a member is in distress, has a sensitive diagnosis, or is in an emergency context, intrusiveness risk scores 0.95+ and commerce is suppressed. SUPPRESSION RULES If a member has explicitly rejected an offer, or has ignored 2+ offers in the session, the suppression engine adds a hard override regardless of the composite score. All 10 dimension scorers are deterministic rule-based functions no LLM involved in the scoring itself. The ML plugin slot is available for future predictive models.	Phase 2 commerce enable/disable Offer suppression rules Domain adjacency weights In-house vs vendor preference	No offer shown
Ground Truth Foundation, deterministic facts that no AI layer may modify
📋Domain Knowledge & Plan Facts Claims, Benefits, Preauth	Low, AI enhances, never authors	AI does not generate claim status, coverage percentages, benefit limits, or preauth decisions. These are always read from the insurer's system of record and presented with 100% deterministic confidence. LLM role: phrasing only The response naturalization layer may rephrase the presentation of facts, but cannot change the facts themselves. Coverage is what the plan says it is.	AUTHORITATIVE Claim status interpretation engine maps 8 canonical claim statuses to member-safe summaries and next actions. Confidence score is always 1.0. Benefit explanation reads coverage %, annual limit, per-visit limit, pre-auth requirement, referral requirement, network restriction, waiting period, and exclusions directly from the canonical benefit record, with benefit code cited as evidence in the audit log. AUDIT TRAIL Every domain engine run produces a full audit annotation referencing the canonical entity (claim ID, benefit code). Evidence is never inferred.	Permission scopes: view_claims, view_benefits Locale: EN / AR output Escalation flag thresholds	Degraded mode
🛡Access, Consent & Compliance Governance Layer, Stage 1	None, fully deterministic	No AI involvement by design. These engines run first in every pipeline invocation, before any AI call is made. Their decisions cannot be overridden by anything that runs after them. This is intentional architecture, governance decisions must be fully explainable, auditable, and deterministic.	RUNS FIRST, always Access Policy evaluates actor role + session assurance (none / low / medium / high) against required levels per data scope. Produces allowed vs. denied scope list before any other engine touches the request. CONSENT GATE If `ai_processing_consent` is missing, the entire AI Assist family is denied. No engine can bypass the consent gate. PII masking is activated independently of AI mode. FULL AUDIT Every governance decision produces a timestamped audit annotation, actor type, assurance level, granted/denied scopes, consent scopes present/missing, for regulatory reporting.	Assurance requirements per scope Consent scope requirements Strict vs standard consent mode PII masking policy	Hard block
AI Runtime Modes (globally switchable): `DETERMINISTIC_ONLY`, all AI calls disabled, full deterministic fallbacks active · `HYBRID_ASSISTED_LOCAL`, OpenAI-compatible model via managed integration · `HYBRID_ASSISTED_AZURE`, Azure OpenAI with configurable endpoint, version, and timeout · `LLM_DISABLED_BY_POLICY`, tenant or consent policy has blocked AI. Mode is fetched from DB on each request (30s cache), no restart required to change it.