// CASE / 004 · 2026 · 5 MIN READ

rag for legal contract analysis. clauses, surfaced.

A Fortune 500 bank's legal and compliance teams needed to extract clauses, surface risks, and summarize complex agreements faster than manual review allowed. The fix was a grounded retrieval pipeline that knew what to bring back, and what not to.

client: major banking client (anonymized)
role: ibm solutions architect
stack: watsonx.ai · vector store · enterprise integrations
arc: pre-sales → production

context

The customer's legal and compliance organizations were spending review cycles measured in days per agreement — surfacing clauses, flagging risks, summarizing for downstream teams. Manual review didn't scale with deal volume, and the off-the-shelf summarization tools they had tried were either too generic to ground in their precedent or too aggressive to trust on regulated material.

Most "AI" wins in banking are really document-search wins wearing better clothes.

diagnosis

The interesting problem wasn't "summarize this contract." It was "summarize it the way this bank's legal team would summarize it, with citations they can defend in front of a regulator." Out-of-the-box LLMs don't do that; they hallucinate plausible clauses, and they don't know which precedent or addendum to weight. The work was grounding — pulling the right slice of the bank's own document corpus into context, and constraining generation to it.

architecture

01ingest · enterprise document stores connected to the indexing pipeline (PDFs, contracts, addenda, precedent)

02embed + index · vector store keyed by document type, jurisdiction, and counterparty

03retrieve + rerank · query-time semantic retrieval with reranking over candidate passages

04ground + generate · watsonx.ai foundation models constrained to retrieved citations, structured outputs for clause / risk / summary fields

05review · human-in-the-loop UI for legal teams to accept, edit, or reject each surfaced item with audit trail

what shipped

A production RAG pipeline on IBM watsonx.ai integrated with the bank's existing enterprise data stores. Legal teams query it the way they query a junior associate: "find the indemnity clauses in this MSA," "show me the change-of-control language across this counterparty's contracts." The system returns grounded answers with citations into the source documents. Compliance uses the same pipeline with different prompt templates for regulatory review.

what i'd change

The eval discipline was the thing that earned the production trust — but it lagged the architecture work. Next engagement I'd build the rubric and the regression set in parallel with discovery, not after it. Also: clause-extraction benefits from fine-tuned models more than generic foundation ones; v2 has room for LoRA adapters on the bank's own historical review data.