Extract structured data from invoices.
vlm-1
can extract structured data from invoices, along with their visual grounding in PDF or image format. Here’s a step-by-step guide on how to parse an invoice:
Here is a visualization of the parsed invoice along with the visual grounding that vlm-1
can extract from an invoice. Notice that only the specific items requested in the schema are retrieved and visualized, unlike OCR which returns all text in the document with no context:
Parsing an invoice with visual grounding enabled
Submit an Invoice Parsing Job
Wait for the Job to Complete
predictions.wait
method:config=GenerationConfig(grounding=True)
parameter when submitting the job (as shown below).