Use this file to discover all available pages before exploring further.
Invoice Parsing Demo
Navigate over to the invoice-parsing playground in our playground to see the invoice parsing in action.
vlm-1 can extract structured data from invoices, along with their visual grounding in PDF or image format. Here’s a step-by-step guide on how to parse an invoice:Here is a visualization of the parsed invoice along with the visual grounding that vlm-1 can extract from an invoice. Notice that only the specific items requested in the schema are retrieved and visualized, unlike OCR which returns all text in the document with no context:
You can now wait for the job to complete by calling the predictions.wait method:
# Wait for the job to completeresponse: PredictionResponse = client.predictions.wait( id=response.id, timeout=120,)print(f"Job completed:\n {response.model_dump()}")
For higher-quality results, you can enable Visual Grounding to help the model understand the invoice and extract more accurate information. You can do this by setting the config=GenerationConfig(grounding=True) parameter when submitting the job (as shown below).
from vlmrun.client.types import GenerationConfig# Enable grounding when submitting the jobresponse: PredictionResponse = client.document.generate( file=Path("<path/to/invoice.pdf>"), domain="document.invoice", batch=True, config=GenerationConfig(grounding=True),)