Bacardi Insights
Analytical chatbot for the C-level: ask in natural language about P&L and market share; get back executed SQL, charts, and an executive summary.
Summary
A multi-agent analytics platform for the Bacardi C-suite. Natural-language questions on financial performance and market share resolve through a LangGraph supervisor, a three-tier cache, and deterministic chart rendering — and ship with the SQL, the trace, and the routing rationale visible. Built to beat black-box copilots on transparency and speed.
Details
- My role
- AI product engineer
- Period
- 2026
- Status
- Live
- Stack
- LangGraphMLflowFastAPIReact/TSDatabricksGenie SpacesVector SearchLakebaseRechartsChatDatabricks
Context
C-suite users at a global spirits group needed answers — financial performance and market share — without the 24-to-72-hour analyst loop. The internal contender was an internal copilot tool deployment optimised for breadth and brand familiarity; what it could not offer was legibility. Executives accept AI in the workflow only when they can see why a number is what it is. A black-box answer in a quarterly review is not an answer; it is a liability. The first-generation chatbot proved the appetite but exposed the architectural cost: a monolithic agent, a frontend orchestrating multi-step plans, twenty-six phases of accumulated workarounds, and three dead code paths. Latency was unpredictable, the rendering layer was fragile, and adding a domain meant editing four files in two repositories. v2 had to keep the wins — golden cache, fallback layer, Recharts rendering — and discard the orchestration pattern entirely. The brief: an analytics platform with five modules, a clean LangGraph spine, and transparency as a first-class product feature, not a debug tool.
Architecture
A single Databricks App ships backend and frontend in one deploy. The backend is a LangGraph StateGraph with five specialist nodes; the frontend renders Recharts deterministically and exposes the trace, the SQL, and the routing decision next to every answer.
- Supervisor split into three traceable nodes — rewrite, intent, routing — instead of a single opaque step.
- Tier 1 — Vector Search-backed golden SQL cache with cosine and an LLM arbitrator in the gray zone.
- Tier 2 — text-to-SQL with a five-stage validator enforcing CTE and FULL OUTER JOIN over the financial mart.
- Tier 3 — Genie Spaces fallback for the long tail that fits neither the cache nor the validator.
- LLM layer with `ChatDatabricks.with_fallbacks([Sonnet → Haiku → backup models])` — resilience at library level.
- State persisted in Lakebase Postgres via the LangGraph checkpointer; survives proxy timeout and browser refresh.
- MLflow `langchain.autolog()` instruments every node span for LLM auditability.
- React/TypeScript frontend renders Recharts deterministically — same data, same chart, always.
Key decisions
- Multi-agent over monolith.
- Specialist nodes per domain instead of a single agent. Orchestration belongs server-side, not in a React component: the migration removes multi-step plans from the frontend and lets each node assume a bounded, testable responsibility.
- Three-tier cache with gray-zone arbitration.
- Cosine ≥0.90 executes directly; the 0.80-0.90 band fires an LLM arbitrator; <0.80 falls through to text-to-SQL. Most queries never reach Genie and the gray zone never silently classifies paraphrases as misses.
- `ChatDatabricks` + `.with_fallbacks()` + autolog.
- Replaces roughly three hundred lines of custom retry, logging, and parsing code. Resilience lives at the library layer, not sprinkled across nodes: a single swap absorbs the entire concern and shrinks the maintenance surface.
- Validator-enforced SQL pattern.
- The critic enforces CTE-aggregation followed by FULL OUTER JOIN over the financial mart. It is a guardrail in code, not in a prompt: it blocks silently multiplied rows before a wrong number ever reaches an executive screen.
- Job-based execution plus Lakebase checkpointer.
- Survives the ninety-second proxy timeout and a browser refresh, and enables HITL via `interrupt_before` once the flow needs it. State persistence stops being a frontend workaround and becomes a backend primitive.
- Deterministic chart selection in two places.
- Backend `chart_spec.py` and frontend `chart-selector.ts`, with parity enforced by tests. Same data, same chart, always — no client-side surprises during an executive review.
Lessons learned
- Replacing a custom `call_llm()` helper with `ChatDatabricks` plus `.with_fallbacks()` and autolog deleted roughly three hundred lines of retry, logging, and parsing — one library swap, an entire concern absorbed.
- Phase 26 was the lesson. The v1 frontend orchestrated plan→step→synthesise loops because the backend could not. LangGraph eliminates the need: the clean rewrite was cheaper than the retrofit because the v1 orchestration was load-bearing tech debt.
- Transparency as differentiator. Showing the SQL, the routing decision, and the per-node trace is what wins against an internal copilot of comparable raw capability. Executives accept "the model" when they can see the work and reject it when they cannot.
- Empirical Gap discipline. Some pieces are logically complete but cannot be validated without production access; marking them explicitly prevents shipped work from looking unfinished.
- Package conflict caught early. `databricks_langchain` (not `langchain_databricks`) is the namespace compatible with the Vector Search pin — codified as a permanent decision so it never recurs.
Status & roadmap
- Current state
- Backend and frontend feature-complete on the five-domain scope; LangGraph multi-agent spine running, three-tier cache live, and the ChatDatabricks fallback chain in production covering Sonnet, Haiku, and backup models.
- Next steps
- Activation of HITL via the `interrupt_before` checkpoint to confirm high-impact steps, production deployment of the four-LLM endpoint smoke test, and promotion of the MLflow Experiment to the production workspace with observability aligned to the platform team.