№ 14 · AI products

Chamberlain Insights

AI platform replacing static OKR dashboards with interactive analytics, multi-domain.

Live2026ChamberlainReact/Vite/TSFastAPILangGraphDatabricks

Summary

An AI-powered C-level analytics platform that replaces a static BI dashboard with a team of specialised agents that reason, self-correct, and escalate to humans when judgment is required. Purpose-built for one company's OKR structure — not a generic platform — and validated to the penny against the legacy BI tool.

Details

My role
AI product engineer
Period
2026
Status
Live
Stack
React/Vite/TSFastAPILangGraphDatabricksUnity CatalogSQL Statement APIpgvector

Context

The brief was simple on paper and unforgiving in practice: replace a weekly-refreshed BI dashboard used by the executive team with something that could answer cross-domain questions in natural language, not just render KPI tiles. Five OKR domains, dozens of key results, three upstream data systems — financial, subscription billing, dimensional gold — and a non-negotiable accuracy bar: every published number had to reconcile to the legacy BI tool to the penny. The starting point was a single-pipeline POC that handled "what is X?" cleanly but collapsed on questions that required decomposition: why did one segment's volume drop while its average selling price grew, where is the gap between target and actuals coming from. Linear pipelines do not answer those questions. A team of agents can.

Architecture

A Databricks App with a React 19 + Vite + TypeScript frontend and a FastAPI backend wrapping a LangGraph multi-agent pipeline: a Supervisor StateGraph plus three specialist sub-graphs, all running on Databricks serving endpoints, Unity Catalog, and a Lakebase Postgres with pgvector.

  1. Supervisor StateGraph that decomposes executive questions into sub-tasks and reassembles the final answer.
  2. Data Agent with eleven nodes: intent, golden cache, few-shot, SQL generation, validation, execution, narration.
  3. Reasoning Agent dedicated to cross-domain narrative synthesis over consolidated sub-agent results.
  4. Validation Agent — deterministic, no LLM in the loop — for cross-reference against the legacy BI tool.
  5. Inference on Databricks serving endpoints with Claude Opus, Sonnet, and Haiku per role and per-node cost.
  6. SQL execution via the SQL Statement API against Unity Catalog reading only from the curated semantic schema.
  7. Lakebase Postgres with pgvector for the 1024-dimensional golden cache, pinned charts, and the human-review queue.
  8. MLflow Tracing instruments every LLM call and SSE streams pipeline execution to the frontend in real time.

Key decisions

Multi-agent over monolith.
A Supervisor decomposes the executive question into sub-tasks and dispatches them to three specialists — Data Agent (text-to-SQL with self-correction and a golden cache), Reasoning Agent (cross-domain narrative synthesis), and Validation Agent (deterministic cross-reference, no LLM in the loop) — before reassembling the answer. The pattern is industry-standard in finance ops; the value here is the orchestration discipline.
Self-correction as a product feature.
When generated SQL fails or produces an outlier, the Data Agent reads the error, retries with explicit feedback, and surfaces the trace. When confidence is low or the impact is high, a HITL checkpoint fires through the graph framework's interrupt-and-resume primitive.
Penny-perfect or it does not ship.
Every published number is reconciled against the legacy BI tool within documented tolerance. Numbers that cannot match are flagged "TBD," never silently approximated: executive trust is built one answer at a time and lost in a single one.

Lessons learned

  • The most expensive bug we caught was a double-digit-percent discrepancy that everyone assumed was a definition mismatch. Four commercial volume metrics in the financial domain were over-counting against the legacy tool. The first three hypotheses were all semantic — segment, hierarchy, product-type filter. We rewrote SQL three times. Nothing closed the gap.
  • The root cause lived upstream. The legacy tool was reading a different source table — a model-layer financial-statement view, not the daily fact — and applying a hidden currency-type filter inherited from the source system that never reached our copy. The lesson: validate the join before debating the semantics. Every subsequent metric got a source-table provenance check before anyone touched a SQL fragment.
  • Non-negotiable architectural rule: the application reads only from the curated semantic schema, never from raw source tables. Enforced at the configuration layer so it cannot sneak in via a feature branch.
  • Validation moved out of the application repository and into reusable notebooks owned by the data-engineering team, with a sub-tenth-of-a-percent variance gate before any source flag flips on in production.
  • The system catches its own anomalies live. The demo flow deliberately shows the agent flagging a value, retrying with a corrected filter, and recovering the right answer in front of the audience — transparency is the product.

What it enables

  • Five of six OKR domains live in the platform, covering most of the key-result surface — every domain whose source data exists in the warehouse.
  • Backend and frontend feature-complete: five dashboard cards, natural-language query input, multi-agent chat with reasoning trace, HITL confirm/refine/reject UX, per-user pinned charts, an admin panel with golden-query review, and a Three.js-rendered shell.
  • A thirty-five-query golden cache seeded from validated SQL, delivering sub-three-second answers on the common asks.
  • Text-to-SQL covering the long tail in under eight seconds with self-correction over runtime errors and detected outliers.
  • Penny-perfect validation in two domains against the legacy BI tool, two more reconciling within strict tolerance, and one directional pending an upstream rebuild.
  • Live demo where the system self-corrects in front of the audience — flags a value, retries, and recovers the right answer with no manual intervention.

Status & roadmap

Current state
Backend and frontend feature-complete across five OKR domains; thirty-five-query golden cache in production; two domains validated to the penny, two within tolerance, and one directional pending an upstream rebuild; demo flow shipped and operational.
Next steps
Sixth domain unlocked when its source data lands in the warehouse. The lessons from the three-workstream PBI alignment pivot inform the upstream contract for the remaining domain to ensure penny-perfect validation holds when it is added.