Parsing Invoices
Extract structured data from invoices.
Getting Started
VLM-1 can extract structured data from invoices in PDF or image format. Here’s a step-by-step guide on how to parse an invoice:
Upload Invoice
Use the /v1/files
endpoint to upload the invoice you want to parse.
You should see a response like this:
Uploaded file:
{
'id': '1e76cfd9-ba99-49b2-a8fe-2c8efaad2649',
'filename': 'file-20240815-7UvOUQ-invoice_example.pdf',
'bytes': 62430,
'purpose': 'assistants',
'created_at': '2024-08-15T02:22:06.716130',
'object': 'file'
}
Submit the Invoice Parsing Job
Submit the uploaded file (via its file_id
) to the /v1/document/generate
endpoint to start the invoice parsing job. This endpoint supports PDF files and images, and submits the job to a queue for processing (batch=True
).
You should see a response like this:
Invoice parsing job submitted:
{
"id": "052cf2a8-2b84-45f5-a385-ccac2aae13bb",
"created_at": "2024-08-15T02:22:09.157788",
"response": null,
"status": "pending"
}
Fetch the Results
Use the /v1/document/{request_id}
endpoint to fetch the results of the invoice parsing job. The results of the extraction job will be in JSON format under the response
field.
Illustrative Examples
Here is an example of the structured JSON output that VLM-1 can extract from an invoice:
Parsing an invoice with VLM-1
You should see a response like this:
{
"id": "052cf2a8-2b84-45f5-a385-ccac2aae13bb",
"created_at": "2024-08-15T02:22:09.157788",
"status": "completed",
"response": {
"invoice_id": "79BBD516-0005",
"period_start": null,
"period_end": null,
"invoice_issue_date": "2024-01-10",
"invoice_due_date": "2024-02-09",
"order_id": null,
"customer_id": null,
"issuer": "Typographic",
"issuer_address": {
"street": "1 Grand Canal St Lower",
"city": "Dublin",
"state": "Co. Dublin",
"postal_code": "D04 Y7R5",
"country": "Ireland"
},
"customer": "French Customer",
"customer_email": null,
"customer_phone": "+33 1 23 45 67 89",
"customer_billing_address": {
"street": "5 Avenue Anatole France",
"city": "Champ de Mars",
"state": "Paris",
"postal_code": "75007",
"country": "France"
},
"customer_shipping_address": null,
"items": [
{
"description": "Line Item 1",
"quantity": 1,
"currency": "EUR",
"unit_price": 10.0,
"total_price": 10.0
},
{
"description": "Line Item 2",
"quantity": 1,
"currency": "EUR",
"unit_price": 5.0,
"total_price": 5.0
}
],
"subtotal": 15.0,
"tax": 0.0,
"total": 15.0,
"currency": "EUR",
"notes": "[1] Tax to be paid on reverse charge basis",
"others": {
"due_amount": 15.0,
"vat_number": "FRAB123456789",
"support_email": "support@typographic.com",
"contact_phone": "+353123456789"
}
}
}
As you can see, VLM-1 can extract a wide range of detailed information from the invoice, including vendor and customer details, line items, payment terms, and more. This structured data can be easily integrated into various financial systems, accounting software, or used for automated invoice processing.
Get Started with our Document -> JSON API
Head over to our Document -> JSON to start building your own document processing pipeline with VLM-1. Sign-up for access to our API here.