HerClarityAI — Hannah Kwakye | Certified AI Systems Specialist

A detailed case study of how we built an AI-powered ESG compliance mapping agent for a healthcare procurement business — and what it took to get it right.

The Client and the Problem

Aurum Aesthetics is a healthcare procurement business operating across multiple jurisdictions. As part of their procurement process, they were required to map supplier data against ESG (Environmental, Social, and Governance) compliance frameworks — a process that was entirely manual, highly time-consuming, and prone to inconsistency.

The specific problem: a compliance analyst was spending approximately 62 hours per month manually reviewing supplier documentation, cross-referencing it against framework requirements, and producing compliance reports. The work was repetitive, followed a consistent methodology, and was bottlenecking the procurement pipeline.

This was a textbook AI automation candidate.

The Architecture

Before writing a single line of code, we documented the architecture in full. The system needed to:

Ingest supplier documentation in multiple formats (PDF, Word, Excel, web pages)
Extract relevant data points against a defined taxonomy of ESG criteria
Map extracted data against the applicable compliance framework
Identify gaps where supplier data didn't meet framework requirements
Generate a structured compliance report in the required format
Flag edge cases for human review

The architecture document became the test specification. Before building, we defined 30 test scenarios — including edge cases, ambiguous inputs, and the situations that had historically caused problems in the manual process.

The Build

The system was built using a multi-agent architecture:

Ingestion agent: Handles document parsing across formats, normalises the data, and passes it to the extraction agent.

Extraction agent: Uses a fine-tuned prompt to identify and extract relevant ESG data points from the normalised document. Outputs structured JSON.

Mapping agent: Takes the extracted data and maps it against the compliance framework. Identifies gaps, assigns confidence scores, and flags items requiring human review.

Report generation agent: Takes the mapping output and generates the compliance report in the required format, including a summary section, detailed findings, and a gap analysis.

Orchestrator: Manages the workflow between agents, handles errors, and routes edge cases to the human review queue.

The Testing Phase

We tested the system against 30 real supplier documents from the client's existing database. The results:

Accuracy rate: 94% of data points correctly extracted and mapped on the first pass
Edge case handling: 8% of cases flagged for human review (the target was under 15%)
False negatives: 2% of compliance gaps missed (these were caught in the review process)
Processing time: Average 4 minutes per supplier, compared to 45 minutes manually

After two rounds of prompt refinement based on the test results, accuracy improved to 97% and the false negative rate dropped to under 1%.

The Results

After 90 days in production:

62 hours per month of manual compliance mapping eliminated
API cost: Under £8/month (primarily GPT-4 token costs)
Compliance pipeline throughput: Increased by 340% (from 8 to 27 suppliers processed per month)
Consistency: 100% of reports now follow the same format and methodology
Human review time: Reduced from 62 hours to approximately 6 hours (reviewing flagged edge cases and spot-checking outputs)

The ROI was clear within the first month.

What Made This Project Work

Looking back, four factors made this project successful:

1. The problem was well-defined. We spent significant time upfront mapping the exact process, identifying the decision points, and documenting the edge cases. This investment paid off throughout the build.

2. The test specification preceded the build. By defining what success looked like before we started building, we had a clear target and could iterate toward it systematically.

3. We built for human oversight. The system was designed to flag uncertainty and route edge cases to humans — not to operate without oversight. This made the client comfortable deploying it and gave us a mechanism for continuous improvement.

4. We measured everything. From day one, the system tracked accuracy rates, processing times, and error patterns. This data drove the improvements in the first 90 days.

The Broader Lesson

This project is a good example of what AI automation looks like when it's done well: a specific problem, a well-designed solution, rigorous testing, and measurable results.

It's also a good example of what it doesn't look like: it wasn't built in a weekend, it wasn't deployed without testing, and it wasn't positioned as a replacement for human judgment. It was built as a tool that makes the human's job better — faster, more consistent, and focused on the work that actually requires human expertise.

That's the standard I hold every system I build to.

Ready to apply this?

Start with a free 30-minute diagnostic call.

BOOK A CALL

ESG Compliance Automation: How We Saved a Healthcare Client 62 Hours Per Month