vlm-1 can extract structured data from invoices in PDF or image format. Here’s a step-by-step guide on how to parse an invoice:

For higher-quality results, you can enable Visual Grounding to help the model understand the invoice and extract more accurate information. See High-Quality Invoice Parsing with Grounding for more details.

Parsing Invoices in 2 Steps

1

Submit an Invoice Parsing Job

from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import FileResponse

# Initialize the client
client = VLMRun(api_key="<your-api-key>")

# Submit the invoice for parsing
response: PredictionResponse = client.document.generate(
    file=Path("<path/to/invoice.pdf>"),
    domain="document.invoice",
    batch=True,
)
print(f"Job submitted:\n {response.model_dump()}")

You should see a response like this:

Job submitted:
{
  "id": "052cf2a8-2b84-45f5-a385-ccac2aae13bb",
  "created_at": "2024-08-15T02:22:09.157788",
  "response": null,
  "status": "pending"
}
2

Wait for the Job to Complete

You can now wait for the job to complete by calling the predictions.wait method:

# Wait for the job to complete
response: PredictionResponse = client.predictions.wait(
    id=response.id,
    timeout=120,
)
print(f"Job completed:\n {response.model_dump()}")

Illustrative Example

Here is an example of the structured JSON output that vlm-1 can extract from an invoice:

Parsing an invoice with `vlm-1`

You should see a response like this:

Job completed:
{
    "id": "052cf2a8-2b84-45f5-a385-ccac2aae13bb",
    "created_at": "2024-08-15T02:22:09.157788",
    "status": "completed",
    "response": {
        "invoice_id": "79BBD516-0005",
        "period_start": null,
        "period_end": null,
        "invoice_issue_date": "2024-01-10",
        "invoice_due_date": "2024-02-09",
        "order_id": null,
        "customer_id": null,
        "issuer": "Typographic",
        "issuer_address": {
            "street": "1 Grand Canal St Lower",
            "city": "Dublin",
            "state": "Co. Dublin",
            "postal_code": "D04 Y7R5",
            "country": "Ireland"
        },
        "customer": "French Customer",
        "customer_email": null,
        "customer_phone": "+33 1 23 45 67 89",
        "customer_billing_address": {
            "street": "5 Avenue Anatole France",
            "city": "Champ de Mars",
            "state": "Paris",
            "postal_code": "75007",
            "country": "France"
        },
        "customer_shipping_address": null,
        "items": [
            {
            "description": "Line Item 1",
            "quantity": 1,
            "currency": "EUR",
            "unit_price": 10.0,
            "total_price": 10.0
            },
            {
            "description": "Line Item 2",
            "quantity": 1,
            "currency": "EUR",
            "unit_price": 5.0,
            "total_price": 5.0
            }
        ],
        "subtotal": 15.0,
        "tax": 0.0,
        "total": 15.0,
        "currency": "EUR",
        "notes": "[1] Tax to be paid on reverse charge basis",
        "others": {
            "due_amount": 15.0,
            "vat_number": "FRAB123456789",
            "support_email": "support@typographic.com",
            "contact_phone": "+353123456789"
        }
    }
}

High-Accuracy Parsing with Grounding

For higher-quality results, you can enable Visual Grounding to help the model understand the invoice and extract more accurate information. You can do this by setting the config=GenerationConfig(grounding=True) parameter when submitting the job (as shown below).

from vlmrun.client.types import GenerationConfig

# Enable grounding when submitting the job
response: PredictionResponse = client.document.generate(
    file=Path("<path/to/invoice.pdf>"),
    domain="document.invoice",
    batch=True,
    config=GenerationConfig(grounding=True),
)

Try our Document -> JSON API today

Head over to our Document -> JSON to start building your own document processing pipeline with VLM Run. Sign-up for access on our platform.