Product Architecture

The Agentic Operating System for E-Commerce

Terminal6 aggregates data from every marketplace, builds unified brand context, and deploys AI agents that monitor, reason, and act — 24/7, with full auditability and human governance.

Founded 2026 Design Partner: Sprig (11,252 SKUs) contact@terminal6.io

Architecture: 6-Layer OS

Each layer depends only on the one below. Ship L1 + L2 + one agent without building the full stack.

Workspace UI

Morning briefing, agent views, approval queue, decision feed, agent chat

Execution

Action queue, API executors, retry & backoff, rollback, audit log

Policy & Governance

Rule engine, approval routing, spend guardrails, kill switch, RBAC

Agent Runtime

Hierarchical agents (L0 triage → L1 diagnosis → L2 execution), two-tier monitoring, skill-driven routing, evidence briefs

L2.5

Evidence & Signals

Stock proxy, funnel signals, anomaly scores, cross-channel metrics. Computed deterministically. Shared across all agents.

Unified Data & Context

Unified SKU graph, brand profile, inventory ledger, decision history, time-series

Integration & Ingestion

Amazon SP-API, Shopify, Flipkart, Unicommerce, GA4, Google Ads, Meta Ads, CSV

Data Architecture

Four layers from raw ingestion to decision memory. Each layer adds proprietary value.

Layer A: Raw Store

Untouched API responses & CSVs. Audit trail. Re-derive anything from raw data.

Layer B: Canonical Schema

16 normalised PostgreSQL tables. Core: Unified SKU linking ASIN, FSN, Shopify Variant into one Terminal6 ID.

Layer C: Evidence & Signals (= L2.5)

Stock proxy, funnel signals, anomaly scores, cross-channel attribution, delivery speed scoring, competitive price indices. Computed deterministically. Shared across all agents. This is L2.5 in the OS stack.

Layer D: Decision Memory

Every agent decision: trigger → context → reasoning → policy check → outcome. This is the flywheel.

Canonical Schema (16 Entities)

ChannelListing is the Rosetta Stone — every daily table FKs to it. MasterSKU is at variant level, not parent.

Group	Entity	Purpose
Core Graph	BrandProfile	One per brand. Channels, currency, GST, thresholds.
	BrandCategoryPolicy	Per-category margin/spend rules.
	BrandDirective	Living strategy: meetings, tactics, overrides. Priority-stacked.
	MasterSKU	One per variant. Internal SKU code is universal anchor.
	ChannelListing	Maps MasterSKU → ASIN/FSN/Variant. Central FK hub.
Time-Series	DailySales	Units, revenue, returns per SKU per channel per day.
	DailyTraffic	Sessions, page views, conversion rate.
	DailyInventory	Stock per SKU per FC per day.
	CampaignDailyMetrics	Raw campaign data: spend, clicks, impressions, ACOS.
	SKUAdAttribution	Allocated ad spend per SKU (SP=direct, SB/SD=proportional).
	SKUEconomics	The P&L table. Full contribution margin. Quality: estimated → provisional → reconciled.
	ChannelListingSnapshot	Daily price, rating, Buy Box %, BSR.
	ThrottleSignals	6 health signals computed daily per SKU.
Operations	Returns	Per-event return data with reason codes.
	Alert	Generated by evidence layer. Full lifecycle tracking.
	DataImportLog	Tracks every import for auditability.

6 Throttle Signals

Every active SKU is scored on 6 signals. Computed deterministically in L2.5 (Evidence Layer) on cron. Consumed by L2 junior agents for execution decisions and by L1 senior managers for strategic oversight.

Inventory Cover

Days of stock remaining. <3d = pause ads. 3-7d = throttle. >30d = push harder.

Margin Health

Current margin vs brand's floor. Below floor = stop spending. Above floor+5% = full aggression.

Catalogue Score

Listing quality: title, images, A+ content, video. Bad listing = bad CVR = wasted spend.

Rating & Reviews

<3.5 stars = pause. Negative spike = throttle + alert CX. >4.0 = boost eligible.

Price Competitiveness

Our price vs competitor. >15% more expensive = pause. <5% cheaper = push harder.

Delivery Speed

Same-day = 1.15x bid multiplier. 3D+ = 0.8x. Speed drives conversion.

// Final bid computation
final_bid = base_bid × inventory_mult × margin_mult × catalogue_mult
            × rating_mult × price_mult × speed_mult
// Capped at brand's max_bid_per_click

Knowledge Architecture

Every diagnosis has two parts: the process (universal) and the interpretation (domain-specific). Terminal6 represents both as structured, composable layers that are assembled at runtime.

Layer 1: Skills

Universal process. What factors to check, in what order. Category-agnostic. "Check if conversion dropped" but not "why" — that's the category card's job.

Layer 2: Category Cards

Domain expertise. How signals behave in a specific vertical. "Phone lifecycle drives accessories demand. Organic SEO lags new launches by 2-4 weeks."

Layer 3: Region Cards

Market context. India salary cycles, COD dynamics, festival calendars, logistics constraints. Affects all categories in a geography.

How Knowledge Composes at Runtime

Category cards and region cards compose at L1 (senior manager level) — where domain diagnosis happens. L0 doesn't need phone-lifecycle interpretation. L2 doesn't need salary-cycle context. L1 is where expertise matters.

Trigger arrives at Chief of Staff (L0): L0 brand_head: "D2C conversion dropped 15%. Factors: ATC drop, OOS." L0 context: Brand profile + anomaly summary. No category/region detail. L0 output: Routes to Sr. Marketing + Sr. Category ↓ Sr. Marketing Manager (L1) diagnoses: L1 sr_channels: campaign_diagnosis.md + Category Card: "In mobile accessories, phone lifecycle drives demand. Top SKUs entering decline phase (12-18 month cycle)." + Region Card: "Day 12 of month = salary cycle dip (historically lowest). No Amazon sale running = not cannibalisation." + Brand Context: Organic conv declining for 3 months (known trend). Sprig increased Meta spend recently (mix effect). L1 Diagnosis: Salary cycle dip + phone lifecycle rotation + organic lag. No campaign action needed. Monitor 2 more days. ↓ L2 channels/shopify: invoked for funnel deep-dive

This diagnosis is impossible without composing all layers. But the composition happens at the right level (L1), not everywhere. L0 triages without it. L2 executes without it.

Agent Design: Hierarchical Team

The agent architecture mirrors a real e-commerce team. Junior agents are specialists who monitor and execute. Senior managers diagnose within their domain and resolve conflicts between their reports. The Brand Head (human operator) makes strategic decisions.

The Hierarchy

Level	Role	What They Do	Model
Brand Head	Human operator	Strategy, new directives, novel decisions	—
Chief of Staff (L0)	Anomaly triage	Morning briefing, routing, follow-ups	Haiku
Senior Managers (L1)	Domain leads	Diagnosis, strategy, conflict resolution between junior agents	Sonnet
Channel Agents (L2)	Channel-specific investigation	Within-channel deep-dive, function-level expertise	Sonnet

Five Senior Managers, Each With Specialists

Brand Head (human operator) ├── Chief of Staff (L0) — triage, morning briefing, routing │ ├── Sr. Marketing Manager (L1) — cross-channel budget, marketing strategy │ ├── Amazon Ads Agent (L2) — bids, keywords, campaigns │ ├── Google Ads Agent (L2) — PPC, shopping │ └── Meta Ads Agent (L2) — prospecting, retargeting, creative │ ├── Sr. Category Manager (L1) — assortment, pricing, P&L ownership │ ├── channels/amazon: investigate.md (L2) │ └── channels/shopify: investigate.md (L2) │ ├── Sr. Marketplace Manager (L1) — cross-channel listings, compliance │ ├── Amazon Marketplace Agent (L2) — listing, Buy Box, pricing │ ├── Flipkart Agent (L2) │ └── Shopify D2C Agent (L2) — storefront, conversion, listing sync │ ├── Sr. Ops Manager (L1) — inventory, fulfillment │ ├── Inventory Agent (L2) — stock, reorder, allocation │ └── Fulfillment Agent (L2) — delivery, courier, RTO │ └── Sr. Finance Manager (L1) — P&L, settlements, compliance ├── P&L Agent (L2) └── Settlement Agent (L2)

Why Hierarchical?

Deep Expertise Fits

Amazon Ads Agent only sees Amazon ad data. 50 pages of bid expertise fit in 20K tokens because scope is narrow.

Conflicts Resolve at the Right Level

Keyword choice → junior decides alone. Amazon vs Meta budget → Sr. Marketing. Spend vs stock → BrandDirective or operator.

RCA Matches the Audience

Brand Head sees "revenue dropped." Sr. Manager sees "campaigns underperformed." Junior sees "keyword X lost position 3→8."

Hierarchical Conflict Resolution

Level	Example	Resolved By
Within agent (L2)	Which keyword to bid on	Amazon Ads Agent decides alone
Between siblings (L1)	Amazon wants ₹50K, Meta wants ₹30K, cap is ₹60K	Sr. Marketing: allocates by ROAS
Cross-domain	Marketing wants to spend more, Ops says inventory low	BrandDirective auto-resolves; novel → escalate to operator

The Autonomy Flywheel

The system earns autonomy by encoding operator decisions as directives. Each strategic call the operator makes gets stored, so the same situation auto-resolves next time.

Tier 1: Auto-resolve (~80%)

BrandDirective covers it. "FBA cover < 7 days → throttle ads." Pre-decided. No human needed.

Tier 2: Resolve + inform (~15%)

Directive gives direction, agent judges specifics. "Budget split 60/40 by ROAS." Operator sees in briefing, can override.

Tier 3: Escalate (~5%)

Novel strategic question. "Inventory depleting — increase procurement ₹20L?" Operator decides. Decision becomes new directive.

Day 1: No directives. Everything escalates. Day 30: 2–3 novel situations/day. Day 90: System proposes new directives from patterns. Operator evolves from manager → strategist.

Brand Directives: Living Strategy

Operator intent from meetings and ad-hoc decisions, captured as structured directives with priority stacking.

OVR

Override

Kill switch. Emergency stop. Halts all automated actions.

TAC

Tactical

"Don't touch THERMO-1L-BLK — PO landing tomorrow." Ad-hoc, expires in hours/days.

STR

Strategy

"Go aggressive on Bottles for 10 days, +30% bids, cap 50K/day." From weekly standup.

SEA

Seasonal

"Prime Day prep: 2x inventory by July 1." Calendar-driven, annual.

PRM

Permanent

"Margin floor 28% on Bottles." Set during onboarding, rarely changes.

Key principle: Directives give intent. The Policy Engine's hard constraints are never overridden. A "go aggressive" directive cannot push margin below the floor.

Context Assembly

Each level of the hierarchy sees different data at different resolution. The harness assembles tailored context per agent call.

L0: Chief of Staff (~10K)

Summary metrics across all domains. Anomaly scores. No SKU-level detail. Brand directives. "Revenue dropped 18%. Traffic -19%. 345 SKUs PARTIAL_OOS."

L1: Senior Manager (~15K)

Domain-level detail. Campaign performance, channel trends, inventory overview. Parent's finding. "Google CTR down 40%. Meta ROAS degraded. Amazon stable."

L2: Channel Agent (~12K)

Deep, narrow. Specific keywords, bid history, competitor data, SKU listings. Parent's task. Decision history. "Keyword X lost position 3→8. Competitor increased bid."

All levels include brand context (~2K tokens) + channel knowledge (~3K) + evidence brief (~1K). Total budget: ~10–20K per call. The agent never sees all 50K SKUs — it sees the right 5–10 with deep context.

Policy Engine

Every agent action passes through the policy gate. Hard guardrails that agents cannot bypass.

Type	Example	Enforcement
Hard Constraint	Margin floor 28%	Deterministic block
Approval Threshold	Spend > ₹5,000/day	Routes to founder
Auto-Execute	Pause ads on OOS	Immediate + logged
Time Window	No changes 10PM-6AM	Queues for next window
Escalation	ROAS < 2x for 3 days	Bypasses queue, alerts founder
Kill Switch	Emergency stop	Halts ALL actions

Three-Level Skills

Skills match the hierarchy. Each level receives findings from above and produces output for the level below AND a summary going up.

L0 Investigation Skills

Brand Head reads these.
Input: anomaly batch
Output: "Revenue dropped X%. Top factors: traffic, OOS, cannibalisation"
Routes to: relevant senior managers

L1 Domain Diagnosis

Senior managers read these.
Input: L0 finding + domain evidence
Output: "Google CTR down 40%. Meta ROAS degraded."
Routes to: relevant junior agents

L2 Channel Skills

Junior agents read these.
Input: L1 task + deep data
Output: "Paused 15 campaigns. Bid increase ₹12→17 pending approval."
Executes: API calls

Two L0 roles

brand_head/investigate.md

Triggered by anomalies. Reactive monitoring detects outcome degradation → investigate.md diagnoses top factors → routes to L1 managers.
investigate.md (standardised across all roles)

chief_of_staff skills (morning_briefing.md, event_alert.md)

Triggered by operator request. Strategy mode fans out to L1 agents → synthesizes structured deliverable (plan, forecast, scenario).
morning_briefing.md, event_alert.md, business_review.md

Routing: Declared in Skill, Activated by Evidence, Dispatched by Harness

Every investigate.md declares an Input Metrics table mapping owned metrics → downstream agents. When a metric deviates, the agent checks which input metrics also deviated and invokes the owning agent. The harness assembles context. Fully auditable.

brand_head/investigate.md declares: finding: "traffic_degradation" --> route to: Sr. Marketing finding: "partial_oos" --> route to: Sr. Marketplace + Sr. Ops finding: "cannibalisation" --> route to: Sr. Category chief_of_staff/morning_briefing.md + brand_head/business_review.md: input_needed: "channel_roas" --> consult: Sr. Marketing input_needed: "margin_outlook" --> consult: Sr. Finance input_needed: "growth_priority" --> consult: Sr. Category Same routing mechanism. Different trigger (anomaly vs operator request). Same pattern cascades L1 --> L2.

Decision Flow

The harness walks the tree: top-down for investigation, bottom-up for summary.

Tier 1 Statistical Anomaly: "CVR -2.3 sigma, Flipkart revenue +3.1 sigma" | v Chief of Staff (L0, Haiku) reads: L0 conversion_drop.md + anomaly evidence activates: [traffic_degradation, partial_oos] routes to: Sr. Marketing, Sr. Marketplace, Sr. Ops | +--> Sr. Marketing (L1, Sonnet) } | reads: L1 campaign_diagnosis.md } | activates: [oos_suppression] } run in | routes to: Amazon Ads, Meta Ads } PARALLEL | } +--> Sr. Marketplace (L1, Sonnet) } | activates: [listing_sync_broken] } | routes to: Shopify D2C Agent } | } +--> Sr. Ops (L1, Sonnet) } activates: [warehouse_depletion] } routes to: Inventory Agent } | v Channel Agents (L2, Sonnet, invoked by L1) Amazon Ads: "Pause 15 OOS SKU campaigns" Meta Ads: "Exclude 15 SKUs from retargeting" Shopify D2C: "Fix inventory sync for 345 SKUs" Inventory: "Reorder 12 SKUs, reallocate FC stock" | v Policy Engine --> check each action --> approve / escalate | v Execute --> API calls to Amazon, Meta, Shopify | v Decision Record --> Layer D (Decision Memory) | v Summary flows BACK UP the tree: L2 reports to L1: "Paused 15 campaigns. Fixed 200 of 345 listings. 145 need manual fix." L1 reports to L0: "Marketing: OOS SKUs suppressed. Marketplace: 58% auto-fixed, 42% escalated." L0 compiles: Morning Briefing for operator with actions taken + decisions needed

Typical call volume per trigger: 1 Haiku (L0) + 1–2 Sonnet (L1) + 1–3 Sonnet (L2) = 3–6 LLM calls. Not every trigger activates all branches.

Monitoring: Reactive + Proactive

Two directions of information flow. Reactive (top-down): outcome degraded, investigate why. Proactive (bottom-up): input changed, act before the outcome degrades. Both use the same harness, routing, and policy engine.

Reactive: Two-Tier Outcome Monitoring

Static thresholds don't work — validated empirically. Replaced with self-calibrating detection + LLM triage.

Tier 1: Statistical Anomaly Detection

No LLM. Runs on cron. Free.
z-scores / percentile ranks adapted to each brand's own variance. Catches sudden deviations AND slow structural trends. Outputs: daily anomaly list with severity scores.

Instead of "fire if CVR drops >15%": fire if CVR is 2.3σ below this brand's trailing-90d distribution, adjusted for day-of-week.

Tier 2: LLM Triage (= Chief of Staff)

One Haiku call per daily batch. Cheap.
Reads anomaly list + brand context + directives. Filters noise ("cart abandon +1.1σ on Saturday = normal"). Connects related anomalies ("CVR dip + Flipkart spike = one situation"). Matches to investigate.md. Routes to senior managers.

Full investigation skill (Sonnet) only runs when Tier 2 says "yes" — expensive but rare (2–3 situations/week, not 30/day).

Proactive: Input Monitoring (Bottom-Up)

Don't wait for revenue to drop. L2 agents monitor the inputs to revenue — campaigns, listings, inventory, pricing, delivery — and act before the impact hits.

Agent (L2)	Critical (auto-act)	High (act in hours)	Medium (morning briefing)
Amazon Ads	Campaign suspended	Budget exhausted mid-day	CTR declining 3 days (fatigue signal)
Google / Meta Ads	Account suspended	Spend pacing 2× ahead of plan	CPC rising steadily
Amazon Marketplace	ASIN deactivated	Buy Box lost on hero SKU	Competitor price dropped 10%+
Shopify D2C	"Sold Out" despite warehouse stock	Checkout errors spiking	SEO ranking dropped
Inventory	Stock = 0 on hero SKU	Cover < 3 days (past reorder point)	Cover < 14 days; depletion accelerating
Fulfillment	Courier partner down in region	Delivery SLA degraded	RTO rate spiking in pincode cluster

Proactive Alert Flow

L2 agent detects input change (hourly checks or webhooks) | ├── CRITICAL → auto-act if BrandDirective exists | "Campaign suspended → auto-restart, suppress related ads, notify L1" | Action logged. Operator sees in morning briefing. | ├── HIGH → propose action to L1 for quick approval | "Buy Box lost on hero SKU → repricing recommended" | L1 reviews within hours. | ├── MEDIUM → included in morning briefing | "CTR declining 3 days on Campaign X — creative fatigue signal" | L1 assigns to L2 if action needed. | └── CROSS-DOMAIN → L2 flags to L1, L1 escalates to Chief of Staff "Listing suppressed due to pricing violation" → Sr. Marketplace detects, but fix requires Sr. Category (pricing) → Chief of Staff routes to the right manager

The Morning Briefing Combines Both

Morning Briefing for Brand Head: PROACTIVE (caught overnight): ✓ Amazon campaign X suspended — auto-restarted per directive ✓ Stock on SKU Y hit 3-day cover — reorder triggered ⚠ Buy Box lost on hero SKU Z — repricing proposed, awaiting approval REACTIVE (detected this morning): 📊 D2C CVR anomaly detected (-2.1σ) — investigating traffic mix INFORMATIONAL: ↘ CTR declining on 2 Google campaigns (day 3 of trend) ↘ Competitor A dropped price 12% on phone cases — impact TBD

This is the difference between a recommendation engine and an operating system. Without proactive alerts: "Revenue dropped 30% over the weekend — why?!" With proactive alerts: "Campaign X was suspended and auto-restarted at 8:12pm. Revenue impact: <₹2K. No action needed."

Conversational Interface

Alerts and briefings are push-based (system → operator). But operators also need to pull — ask questions, explore data, make sense of what they're seeing. The chat is the missing piece between the morning briefing and the next day's briefing.

Three modes of interaction

Push

System → Operator.
Morning briefings, proactive alerts, reactive investigations. Covers known patterns. The system speaks first.

Pull

Operator → System.
"Why is Flipkart outperforming Amazon this month?" "Show me top 10 SKUs by margin." Novel questions no skill anticipated. The operator speaks first.

Follow-up

System → Operator (contextual).
"You asked about Samsung S24 stock — 3 other S2x cases also have <5 day cover with ads still running." The system adds what the operator didn't think to ask.

One interface, four modes

The operator talks to one interface — Terminal6. Behind it, the system detects intent and activates the right mode. The operator never chooses a mode — the system figures it out.

Mode	Triggered by	What happens	Model
Data query	"What's my revenue this week?"	CoS: SQL query → grounded answer + follow-up suggestion	Haiku
Investigation	"Why did revenue drop?"	CoS routes to L1 → L2 cascade. Returns structured diagnosis.	Sonnet
Strategy / Planning	"Build me a Q2 budget plan"	Fan-out to multiple L1 agents in parallel → synthesize into structured plan	Sonnet / Opus
Action	"Pause that campaign"	Route to L2 agent + policy check → execute or escalate	Sonnet

Strategy mode: planning with skills

When the operator asks for a plan, the system fans out to multiple L1 domain experts, then synthesizes their inputs into a structured deliverable. Same hierarchy, same agents — used in planning mode instead of monitoring mode.

Operator: "Build me a budget allocation plan for Q2" | v Strategy mode fans out to L1 agents (parallel): | ├── Sr. Marketing: "Q1 ROAS per channel? Campaign trends?" ├── Sr. Finance: "Margin constraints? Cash flow outlook?" ├── Sr. Category: "Which categories have highest growth potential?" └── Sr. Ops: "Inventory outlook? Restock risks in Q2?" | v (all respond) | Synthesis (Sonnet, big context): "Amazon: ROAS 4.2x, margin 32% → +20% budget Google: ROAS 2.8x, declining CTR → maintain, refresh creative Meta: ROAS 1.9x, strong for launches → shift to launch-only Flipkart: ROAS 3.1x, growing share → +15% Proposed Q2 budget: [structured table] Constraints applied: margin floor 28%, TACoS cap 12% Want me to detail the per-campaign breakdown?"

Planning skills (chief_of_staff + brand_head)

Each skill is a structured planning document — same structure as investigation skills (input → reasoning → routes → output) but triggered by operator request, not anomalies, and producing structured deliverables.

Skill	What it produces	L1 agents consulted
Demand planning	SKU-level demand forecast per channel, factoring seasonality + launches	Category + Ops + Marketing
Budget allocation	Optimized marketing spend across channels within margin/TACoS constraints	Marketing + Finance
Launch planning	New product timeline: inventory, listings, ads, pricing sequence	All five
Scenario analysis	"What if we cut Amazon 30%?" — modeled impact on revenue, margin, share	Depends on scenario
Competitive response	Pricing + ad + listing response plan to competitor move	Category + Marketing + Marketplace
Quarterly review	90-day synthesis with strategic recommendations for next quarter	All five + historical data

Conversation flow: from question to directive

Operator: "Why did D2C revenue drop last week?" → Investigation mode: CoS routes to L1/L2, returns diagnosis Operator: "Show me the Amazon pricing detail" → Follow-up: routes to Sr. Category → Amazon Marketplace Agent Operator: "Don't bid on these SKUs until Amazon's promo ends" → Action mode: captures as BrandDirective (TACTICAL, [sku_list]) → Routes to Amazon Ads Agent: pause campaigns One conversation: data → diagnosis → decision → directive. The system learns from every interaction.

Same hierarchy, same routing, same policy engine. The only difference: the trigger is a chat message instead of an anomaly or a cron job.

Data grounding (the hard part)

Every answer must trace to a real database query. The LLM decides what to query; tools execute it; the LLM formats the answer. Data is always real, never generated.

Bad: Generated

"Based on typical e-commerce patterns, your top SKU is likely..."
Hallucination risk. Wrong numbers that get acted on. Never acceptable.

Good: Grounded

"Vivo X200 FE EdgeTone case — ₹42,318 this week, +12% vs last week."
Source: daily_sales, Apr 1–7. Traceable, verifiable.

Phased rollout

Phase	Capability	Example
Phase 1 (MVP)	Read-only chat. Ask questions, get data-grounded answers + follow-up suggestions.	"What's my Shopify revenue this week?" → answer + "3 SKUs are phantom OOS, want details?"
Phase 2	Action through chat. Operator gives instructions, system executes via L2 agents + policy checks.	"Pause that campaign" → "Paused. Revenue impact est. ₹1,200/day. Resume when?"
Phase 3	Directive capture. System recognises recurring instructions and proposes permanent directives.	"You've paused ads on OOS SKUs 5 times. Make it automatic?" → new PERMANENT directive

Why this matters for the business

Engagement is the wedge. The operator starts by asking questions (zero setup, zero trust required). Gets useful, grounded answers. Builds trust. Enables alerts. Enables autonomous actions. Chat is how you earn the right to act. Daily active usage is the leading indicator of expansion from intelligence → execution → full OS. Every conversation is also a directive source — accelerating the autonomy flywheel.

LLM Strategy

Context assembly, not model training. Skills, brand context, category cards, and decision history are injected at runtime. No fine-tuning needed.

No LLM (Tier 1 + input monitors)

Statistical anomaly detection. Proactive input checks. z-scores, change-point detection. Runs on cron. Cost: zero.

Haiku (L0 triage + simple chat)

Chief of Staff: anomaly triage, morning briefing, simple data queries. Fast and cheap — handles 60%+ of all interactions.

Sonnet (L1 + L2 + complex chat)

Domain diagnosis, deep execution reasoning, complex RCA through conversation. 3–6 calls per triggered situation; 5–10 per complex chat session.