The Challenge
Our client — a Series A FinTech startup processing loan applications — had a problem that every fast-growing company eventually faces: their manual processes hadn't scaled with their team.
Three operations staff members were spending 8 hours a day reviewing incoming documents: identity verification docs, bank statements, proof of income, and business registrations. Each packet averaged 24 pages. The review required cross-referencing data across multiple documents, flagging inconsistencies, and making a preliminary recommendation before passing to an underwriter.
The error rate sat at 12%. Not catastrophic, but every missed inconsistency became a compliance liability. And with a 40% quarter-over-quarter growth in applications, they had two options: hire more people, or find a better way.
They came to us looking for the better way.
Our Approach
The first thing we do in every engagement is spend time understanding the process before touching any code. We spent two days shadowing the operations team, documenting their decision trees, and building a taxonomy of the document types and the rules that governed their review.
What we found was that 80% of the review work was deterministic — it followed rules that could be codified. The remaining 20% required genuine judgment, which we'd leave to the humans.
Architecture
We built a three-stage processing pipeline:
Stage 1 — Intake & Classification
Documents arrive via email or API upload. An AWS Lambda function triggers on S3 upload, classifies each document type using a fine-tuned classifier, and routes it to the appropriate extraction workflow.
Stage 2 — Extraction & Validation
Each document type has a dedicated extraction chain powered by GPT-4o with vision. We engineered prompts that extract structured data (name, dates, figures, flags) and return JSON. A validation layer checks the output against business rules: date consistency, format compliance, cross-document name matching.
Stage 3 — Recommendation & Routing
The system aggregates extracted data, runs the full compliance rule set, and generates a structured recommendation report. Cases that pass all rules go to a lightweight review queue. Cases with flags are escalated to senior staff with the specific issues highlighted.
What We Didn't Do
We deliberately did not try to automate the underwriter's final decision. That decision involves context, relationship history, and judgment that shouldn't be removed. The goal was to eliminate the grunt work so the humans could do the actual thinking.
Results
After a 3-week build and a 2-week parallel-run validation period, we flipped the system to primary.
Processing time: Down from an average of 4 hours per application packet to under 35 minutes — with no human involvement for compliant cases.
Manual review time: The operations team now spends under 2 hours per day instead of 8 — and that time is higher-value work (edge case review, exception handling, quality checking the AI output).
Error rate: From 12% to 0.4%. The AI doesn't get tired. It doesn't have a bad day. It applies the rules the same way every time.
Throughput: The company can now process 3× more applications without adding headcount.
Compliance posture: The structured output and audit trail actually improved their compliance documentation. Every decision is now logged with the rule that triggered it.
What We Learned
A few things stood out from this project that apply broadly:
The bottleneck is rarely the AI capability — it's the rule codification. The hardest part of this project was getting the operations team to articulate the implicit rules they followed. Once those were documented, the implementation was straightforward.
Multi-document reasoning requires explicit orchestration. Off-the-shelf solutions fall apart when you need to compare information across documents. We built explicit cross-reference checks rather than relying on the model to infer them from a combined prompt.
Parallel running is non-negotiable. We ran the AI pipeline alongside the manual process for 2 weeks before cutting over. That's where you find the edge cases — and there are always edge cases.