I’m Kunal Bargotra, Product Manager at Adlib, and I’m excited to share some insights about one of the most important (and frankly, most frustrating) challenges facing businesses today: extracting critical data from documents.
Every organization deals with invoices, contracts, employee agreements, and other files that are packed with essential information. But getting to that information? That’s a different story. Some of it is buried in multi-line text, scattered across pages, or hidden behind different labels like "Bill To," "Ship To," or "Employer Name." If you’ve ever tried to automate data extraction from these kinds of documents, you know it’s not as simple as pointing to a field and saying, “That’s the one.”
With our Adlib Transform 2024.2 release, we’ve made it possible to extract data from both structured and unstructured documents leveraging our AiLink connector to Large Language Models (LLMs) — and we’ve done it in a way that’s intuitive, flexible, and built for real-world business challenges.
I want to walk you through why unstructured data extraction is so difficult, how Adlib makes it easier, and what sets us apart.
If you’ve worked with OCR (optical character recognition) tools, you know they’re great at turning images into text. But OCR only reads line-by-line, left-to-right, with no real context. It’s like scanning a page with a highlighter but having no clue what’s actually important.
Here’s where it gets tricky:
These challenges are the reason why most data extraction systems fall back on rigid templates — and why they break the moment the document format changes.
In Adlib Transform 2024.2 we’ve built a system that understands documents. By combining document transformation, LLMs, and prompt engineering, we’ve created a solution that can handle both structured and unstructured content.
Here’s how it works:
The process starts by taking in a document (PDFs, scanned images, etc.) and using OCR to convert it into machine-readable text. But we go further than traditional OCR by adding logic and structure to that text, which primes it for extraction. This ensures the LLM can see the document the way a human would.
Here’s the part I’m most excited about. Instead of forcing users to create templates for every single document type, we let you define what you want to extract.
For example, if you’re working with invoices, you might tell the system to extract:
If you’re working with an employee agreement, you might want to extract:
Instead of making you hunt for each field yourself, Adlib’s system knows how to find them, even if they’re phrased differently from one document to the next.
This is one of my favorite parts. With Adlib’s interface, you don’t need to be an AI expert to guide the LLM. You can provide simple, plain-English instructions.
Here’s an example:
“Look for the customer address on the top-right of the first page.”
This plain-language approach means you don’t have to learn AI syntax or code prompts from scratch. You just tell the system what to look for, and it takes care of the rest.
The real magic happens here. Adlib doesn't just "read" — it "understands."
In one of our invoice examples, the system knew that “Ship To” was the Customer Address, even though it wasn’t explicitly labeled. It also recognized that #24210 was an Invoice ID, even though it wasn’t called "Invoice ID" anywhere.
In an employment agreement, the system was able to recognize that the signatures at the bottom belonged to the Employer (Jim Davis) and the Employee (Neil Thomas) based on the context from earlier in the document. This contextual understanding is only possible because we use LLMs — and prime them properly using Adlib’s custom interface.
See the two examples below.
Once the document has been processed, the extracted information is output as a JSON file or another machine-readable format that can be fed directly into your downstream business systems. No manual reviews. No rework. Just clean, actionable data.
For example, in the case of an invoice, you get a file that looks something like this:
{
"Customer Name": "Neil Thomas",
"Customer Address": "123 Main Street, Toronto, ON",
"Invoice ID": "#24210",
"Items": [
"Belkin Router Accessories",
"Bluetooth Adapter"
],
"Total Amount": "$350.00"
}
For employee agreements, you might extract:
{
"Employee Name": "Neil Thomas",
"Employer Name": "Rockwell Equity Trust",
"Base Salary": "$105,000 per year",
"Annual Bonus Percentage": "18%"
}
With this data, you can integrate directly into financial systems, HR platforms, and other business workflows.
Unlike traditional extraction tools, Adlib’s system is:
This means no more manual keying, no more rework, and no more frustration.
If you'd like to see how it works on your own documents, I’d love to show you. Reach out to your Adlib rep, and we can walk you through it.
Thanks for reading,
Kunal Bargotra, Product Manager, Adlib
Leverage the expertise of our industry experts to perform a deep-dive into your business imperatives, capabilities and desired outcomes, including business case and investment analysis.