A bookkeeper I was talking to last week described her morning like this. She opens her email. There are forty-three new messages. Most of them have something attached. Some are PDFs, some are phone photos, one is a scan of a scan. A few are bank statements she asked the client for three weeks ago. She knows that before lunch, all of it has to end up in Xero, with the right VAT codes, posted against the right supplier, in a form her senior can sign off on.
That gap — between the inbox full of stuff and the clean rows in the accounts package — is what people are starting to call document ingestion.
It is not OCR. It is not data entry. It is the bit in between.
OCR reads characters off a page. Data entry is a human typing numbers into a system. Ingestion is everything that has to happen between those two: catching the document, working out what it is, pulling the fields off it, and routing it to the right place.
If you have ever spent an afternoon renaming files, splitting a 40-page PDF into individual invoices, and then re-keying the totals into Sage, you have done ingestion by hand. It is the unglamorous work that fills up the day and never quite ends.
The four jobs
Strip it back and any ingestion layer is doing four things:
- Capture. Get the document into the system in whatever shape it arrived — PDF, phone photo, scan, multi-page bundle.
- Classify. Decide what it is. Invoice? Credit note? Bank statement? Receipt? Random delivery docket the client included by mistake?
- Extract. Pull the fields off the page. Supplier, date, totals, VAT, line items.
- Route. Send the structured data where it needs to go. Xero, QuickBooks, Sage, a CSV, an API, a human reviewer.
That is it. Everything else is detail.
Why this is becoming its own category
For a long time, this work was just bundled into "the accounts package". Xero had Hubdoc. Sage had its own bits. QuickBooks had something. The trouble is, those tools are built to feed one destination — their own. If a practice runs three clients on three different packages, you end up with three ingestion stacks, three logins, three places where things go wrong.
The thing that has changed is volume and variety. A typical practice now handles more documents, in more formats, from more sources, than the bundled tools were ever designed for. And the cost of getting it wrong — a missed VAT3 reconciliation, a duplicate supplier payment — has not gone down.
So the ingestion layer has started to peel off and become its own thing. Independent. Multi-destination. Designed to deal with the mess at the front of the pipeline so the accounting package only ever sees clean rows at the back.
What to look for
If you are evaluating one of these, the questions worth asking are short:
- Can it handle the formats your clients actually send you, including phone photos and multi-document PDFs?
- Does it know Irish (or UK) VAT, or is it generic?
- Can you send the output to whichever accounting package the client uses, without paying for a different tool each time?
- How fast is the review step when extraction is not quite right? Because it will not always be right.
- Where does the data live? GDPR is not optional.
That is the bar. Anything that meets it will save the bookkeeper from a chunk of her morning. Anything that does not is just more software in the stack.
Where KrinoDoc fits
We built KrinoDoc as that ingestion layer. Documents in, structured data out, your choice of destination. No lock-in to a single accounting package, no template per supplier, no asking the client to please use the standard scanner this time. The mess goes in, the clean rows come out, and the human time gets spent on the bit a human is actually needed for — the judgement call.
That is the pitch. The bookkeeper with the forty-three emails is who we built it for.
