Harold is the intelligent layer between your incoming business documents and your ERP. It extracts structured data from any PDF, normalises it to match your ERP's exact requirements, and delivers it via Zapier — to Xero, Sage, Oracle, SAP, QuickBooks, or anything else in Zapier's ecosystem.
Every ERP system is good at storing, processing, and reporting on structured data. None of them are good at receiving unstructured documents — PDFs, scanned invoices, emailed purchase orders — and turning them into that structured data automatically.
That gap is filled, right now, by people. Someone downloads the PDF, reads it, types the data into the ERP, and moves on to the next one. It is slow, error-prone, and scales linearly with document volume. Every new supplier means more manual work.
Harold sits in that gap. It receives documents, understands them using AI trained specifically on your suppliers' formats, applies your business rules to normalise the data, and delivers clean structured records to your ERP — automatically, for every document, forever.
The challenges
A PDF invoice contains the same information as a database record, but it is locked inside a layout designed for printing, not for reading by software. Generic extraction pulls text from the page, but text without context is useless. '£1,250.00' appearing twice on the same invoice could be the subtotal or the total — the software has no way to know which is which without understanding the document's structure.
Your ERP has defined fields, required formats, and lookup tables. A supplier name must match exactly the name in your vendor master. A GL code must be a valid code in your chart of accounts. A date must be ISO 8601. The gap between 'text extracted from a PDF' and 'data that an ERP will accept' is filled, right now, by a human — and that is the problem Harold solves.
Direct PDF-to-ERP integrations — whether API-based or middleware-based — require development time, ongoing maintenance, and ERP-specific expertise. Every ERP has a different API. Every supplier sends PDFs in a different format. The combination creates an integration matrix that is expensive to build and almost impossible to maintain as documents and systems evolve.
Most businesses have the pieces — an ERP, an email system, a PDF viewer — but nothing in between that understands documents, applies business logic, and routes clean data to the right place. Harold is that intelligent layer. It sits between your incoming documents and your ERP systems, doing the work that currently falls to people.
How Harold works
Harold accepts PDFs, scanned images, and email attachments. Upload directly, email to your Harold automation inbox, or send programmatically via the API. Harold supports invoices, purchase orders, delivery notes, receipts, credit notes, remittances, and any custom document type you define.
Harold's DocuTrain system learns the layout of each supplier's documents. Once trained, it extracts every field — including line-level data — reliably, even as the supplier's format evolves. Confidence scoring flags low-confidence extractions for human review before they reach your ERP. You only touch what Harold isn't sure about.
The Rules Engine transforms extracted data into the exact format your ERP needs. Supplier names become vendor IDs. Dates reformat. VAT codes translate to GL codes. Currency symbols standardise. Any transformation you would otherwise do manually can be defined as a rule and applied automatically to every document from that point forward.
Once reviewed and approved in Harold, documents trigger a Zapier webhook with clean structured JSON. Zapier routes the data to your ERP — creating a bill, a purchase order, a journal entry, or whatever your workflow requires. No custom API integration. No developer needed. If your ERP is in Zapier's 5,000+ app ecosystem, Harold can connect to it today.
Harold is not limited to invoices. Any business document can be trained and processed.
Supplier invoices for goods or services received
Invoices your business issues to customers
Orders sent to suppliers for goods or services
Goods received notes and delivery confirmations
Expense receipts and payment confirmations
Supplier credit notes and adjustments
Payment remittances from customers
Any document type — train Harold on your specific formats
Harold connects to any system with a Zapier connector. If it is in Zapier's 5,000+ app ecosystem, Harold can send data to it today — without custom development.
Native CSV and Excel export are also included on all plans for ERPs without Zapier connectors.
No. Harold is designed to be set up by an accountant, finance manager, or operations team — not a developer. Training templates, configuring rules, setting up the Zapier connection, and managing automations are all handled through the Harold web app. No code is required at any step.
If Harold receives a document from a supplier it has not been trained on, it will still attempt extraction using its default templates — which cover standard invoice and document layouts. The confidence score will reflect how certain Harold is about the extraction. Low-confidence documents are flagged for review. You can then train a new template from that document to improve future processing.
Harold workspaces are per-account. If you manage multiple entities, you would typically create separate Harold accounts for each. Multi-entity support within a single account is on the Harold roadmap.
Documents that Harold cannot extract with sufficient confidence are flagged for manual review rather than silently passed through with wrong data. You see the document, the attempted extraction, and the confidence scores — and can correct any errors before approving the document for export.
Harold processes documents on infrastructure hosted in the EU. Document data is stored securely and is not used to train general AI models. Full data processing agreements are available for enterprise customers.
The Enterprise plan processes up to 3,000 documents per month. For volumes above this, contact the Harold team to discuss a custom arrangement.
Start free. Train Harold on your first document type in under 10 minutes. No developer, no contract, no upfront cost.
From £29/month after trial. Cancel anytime.