Works with any ERP via Zapier — no custom integration required

PDF to ERP integration
without the integration project

Harold is the intelligent layer between your incoming business documents and your ERP. It extracts structured data from any PDF, normalises it to match your ERP's exact requirements, and delivers it via Zapier — to Xero, Sage, Oracle, SAP, QuickBooks, or anything else in Zapier's ecosystem.

The intelligent document layer your ERP is missing

Every ERP system is good at storing, processing, and reporting on structured data. None of them are good at receiving unstructured documents — PDFs, scanned invoices, emailed purchase orders — and turning them into that structured data automatically.

That gap is filled, right now, by people. Someone downloads the PDF, reads it, types the data into the ERP, and moves on to the next one. It is slow, error-prone, and scales linearly with document volume. Every new supplier means more manual work.

Harold sits in that gap. It receives documents, understands them using AI trained specifically on your suppliers' formats, applies your business rules to normalise the data, and delivers clean structured records to your ERP — automatically, for every document, forever.

The challenges

Why PDF-to-ERP is harder than it looks

PDFs are not data — they are pictures of data

A PDF invoice contains the same information as a database record, but it is locked inside a layout designed for printing, not for reading by software. Generic extraction pulls text from the page, but text without context is useless. '£1,250.00' appearing twice on the same invoice could be the subtotal or the total — the software has no way to know which is which without understanding the document's structure.

ERP systems expect data in very specific formats

Your ERP has defined fields, required formats, and lookup tables. A supplier name must match exactly the name in your vendor master. A GL code must be a valid code in your chart of accounts. A date must be ISO 8601. The gap between 'text extracted from a PDF' and 'data that an ERP will accept' is filled, right now, by a human — and that is the problem Harold solves.

Custom integration is expensive and fragile

Direct PDF-to-ERP integrations — whether API-based or middleware-based — require development time, ongoing maintenance, and ERP-specific expertise. Every ERP has a different API. Every supplier sends PDFs in a different format. The combination creates an integration matrix that is expensive to build and almost impossible to maintain as documents and systems evolve.

The intelligent layer is missing from most workflows

Most businesses have the pieces — an ERP, an email system, a PDF viewer — but nothing in between that understands documents, applies business logic, and routes clean data to the right place. Harold is that intelligent layer. It sits between your incoming documents and your ERP systems, doing the work that currently falls to people.

How Harold works

Four stages from PDF to ERP record

Stage 1

Documents in — any format, any supplier

Harold accepts PDFs, scanned images, and email attachments. Upload directly, email to your Harold automation inbox, or send programmatically via the API. Harold supports invoices, purchase orders, delivery notes, receipts, credit notes, remittances, and any custom document type you define.

Stage 2

DocuTrain extracts the right data from every document

Harold's DocuTrain system learns the layout of each supplier's documents. Once trained, it extracts every field — including line-level data — reliably, even as the supplier's format evolves. Confidence scoring flags low-confidence extractions for human review before they reach your ERP. You only touch what Harold isn't sure about.

Stage 3

Rules normalise data to match your ERP's expectations

The Rules Engine transforms extracted data into the exact format your ERP needs. Supplier names become vendor IDs. Dates reformat. VAT codes translate to GL codes. Currency symbols standardise. Any transformation you would otherwise do manually can be defined as a rule and applied automatically to every document from that point forward.

Stage 4

Zapier delivers clean data to your ERP — no code required

Once reviewed and approved in Harold, documents trigger a Zapier webhook with clean structured JSON. Zapier routes the data to your ERP — creating a bill, a purchase order, a journal entry, or whatever your workflow requires. No custom API integration. No developer needed. If your ERP is in Zapier's 5,000+ app ecosystem, Harold can connect to it today.

Every document type your business receives

Harold is not limited to invoices. Any business document can be trained and processed.

Purchase invoices

Supplier invoices for goods or services received

Sales invoices

Invoices your business issues to customers

Purchase orders

Orders sent to suppliers for goods or services

Delivery notes

Goods received notes and delivery confirmations

Receipts

Expense receipts and payment confirmations

Credit notes

Supplier credit notes and adjustments

Remittance advices

Payment remittances from customers

Custom documents

Any document type — train Harold on your specific formats

Works with your ERP — whatever it is

Harold connects to any system with a Zapier connector. If it is in Zapier's 5,000+ app ecosystem, Harold can send data to it today — without custom development.

XeroQuickBooks OnlineSage 50Sage 200Sage IntacctFreeAgentWaveZoho BooksOracle NetSuiteSAP Business OneMicrosoft Business CentralOdooAirtableGoogle SheetsAny system with a Zapier connector

Native CSV and Excel export are also included on all plans for ERPs without Zapier connectors.

Frequently asked questions

Do I need a developer to set up Harold?

No. Harold is designed to be set up by an accountant, finance manager, or operations team — not a developer. Training templates, configuring rules, setting up the Zapier connection, and managing automations are all handled through the Harold web app. No code is required at any step.

How does Harold handle documents it has not seen before?

If Harold receives a document from a supplier it has not been trained on, it will still attempt extraction using its default templates — which cover standard invoice and document layouts. The confidence score will reflect how certain Harold is about the extraction. Low-confidence documents are flagged for review. You can then train a new template from that document to improve future processing.

Can Harold handle multiple entities or companies?

Harold workspaces are per-account. If you manage multiple entities, you would typically create separate Harold accounts for each. Multi-entity support within a single account is on the Harold roadmap.

What happens to documents that fail extraction?

Documents that Harold cannot extract with sufficient confidence are flagged for manual review rather than silently passed through with wrong data. You see the document, the attempted extraction, and the confidence scores — and can correct any errors before approving the document for export.

Is Harold GDPR compliant?

Harold processes documents on infrastructure hosted in the EU. Document data is stored securely and is not used to train general AI models. Full data processing agreements are available for enterprise customers.

Can Harold process high document volumes?

The Enterprise plan processes up to 3,000 documents per month. For volumes above this, contact the Harold team to discuss a custom arrangement.

Connect your documents to your ERP today

Start free. Train Harold on your first document type in under 10 minutes. No developer, no contract, no upfront cost.

From £29/month after trial. Cancel anytime.