A Fortune 500 bank's legal and compliance teams needed to extract clauses, surface risks, and summarize complex agreements faster than manual review allowed. The fix was a grounded retrieval pipeline that knew what to bring back, and what not to.
The customer's legal and compliance organizations were spending review cycles measured in days per agreement — surfacing clauses, flagging risks, summarizing for downstream teams. Manual review didn't scale with deal volume, and the off-the-shelf summarization tools they had tried were either too generic to ground in their precedent or too aggressive to trust on regulated material.
The interesting problem wasn't "summarize this contract." It was "summarize it the way this bank's legal team would summarize it, with citations they can defend in front of a regulator." Out-of-the-box LLMs don't do that; they hallucinate plausible clauses, and they don't know which precedent or addendum to weight. The work was grounding — pulling the right slice of the bank's own document corpus into context, and constraining generation to it.
A production RAG pipeline on IBM watsonx.ai integrated with the bank's existing enterprise data stores. Legal teams query it the way they query a junior associate: "find the indemnity clauses in this MSA," "show me the change-of-control language across this counterparty's contracts." The system returns grounded answers with citations into the source documents. Compliance uses the same pipeline with different prompt templates for regulatory review.
The eval discipline was the thing that earned the production trust — but it lagged the architecture work. Next engagement I'd build the rubric and the regression set in parallel with discovery, not after it. Also: clause-extraction benefits from fine-tuned models more than generic foundation ones; v2 has room for LoRA adapters on the bank's own historical review data.