GraphQL

One of the most powerful features of Vision Language Models is their ability to reason about complex queries, and answer with the relevant data. Unlike traditional OCR-based methods, where every single character is extracted and processed, VLMs admit a much more powerful query mechanism, which is the basis of our GraphQL-based query mechanism.

Why GraphQL?

First, if you are not familiar with GraphQL, it is a query language for APIs that allows you to specify exactly which fields you want to extract from a given schema. Instead of always receiving the full JSON response, and post-processing to extract the specific fields you need, you can simply request only the specific data points relevant to your application. There is a similar and direct analog to querying LLMs today, where you can specify the exact fields you want to extract from the LLM’s response in a structured JSON response.

Querying VLMs with GraphQL

We simply take this one step further, and enable this same query mechanism for Vision Language Models (VLMs). VLM Run’s GraphQL capability enables you to extract only the specific fields you need from complex schemas, improving efficiency for querying and document ETL processes. This powerful feature allows you to precisely control what data is extracted, minimizing server-side processing overhead of extracting unncessary details, and simultaneously reducing the amount of data transferred over the network, providing a much more efficient and scalable way to extract data (i.e. ETL) from complex unstructured data.

Benefits of GraphQL

Improved Performance: Extract only the data fields you need (unlike OCR-based methods), reducing server-side computational overhead.
Reduced Bandwidth: Minimize network traffic by receiving smaller, targeted responses
Flexible Data Selection: Dynamically adjust which fields to extract based on your needs
Hierarchical Queries: Select nested fields with intuitive syntax

A Concrete End-to-End Example

Let’s say you have an invoice PDF document that contains a table of data with an extensive list of fields (such as line items, tax, total, etc.). You can see the official schema we use for invoices here. However, for your use case, you only need to extract the most important fields such as:

Invoice Number: The number of the invoice
Issue Date: The date of the invoice
Due Date: The due date of the invoice
Total Amount: The total amount of the invoice

Since we have already defined the schema for invoices, you can simply use it as a reference and select the fields you specifically need in the following GraphQL query:

{
    invoice_number
    issue_date
    due_date
    total_amount
}

Now that you have defined the GraphQL query, you can provide it via the gql_stmt parameter to the GenerationConfig object in the generate method.

from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig

client = VLMRun(api_key="...")
response = client.document.generate(
    file=Path("path/to/invoice.pdf"),
    domain="document.invoice",
    config=GenerationConfig(
        gql_stmt="{invoice_number issue_date due_date total_amount}"
    )
)

Extracting Nested Fields with GraphQL

GraphQL’s hierarchical query structure enables precise extraction of deeply nested fields from complex document schemas. For instance, consider a scenario where you require not only top-level invoice metadata—such as invoice_number, issue_date, and due_date—but also a specific nested attribute like the postal_code within the customer_billing_address object. This can be accomplished with a single, declarative GraphQL query:

{
  invoice_number
  issue_date
  due_date
  customer_billing_address {
    postal_code
  }
}

This approach leverages GraphQL’s ability to traverse and select arbitrary subfields within a schema, ensuring that only the minimal, application-relevant data is extracted from the model’s output. The result is a significant reduction in both server-side post-processing and network payload size, which is especially impactful when dealing with high-throughput ETL pipelines or latency-sensitive applications. By architecting your data extraction workflows around GQL queries, you can enforce strict data contracts, optimize resource utilization, and build robust, scalable systems on top of VLM Run’s document intelligence capabilities.

Try our Document -> JSON API today

Head over to our Document -> JSON to start building your own document processing pipeline with VLM Run. Sign-up for access on our platform.

Get Started

Capabilities

Guides - Doc AI

Guides - Image AI

Guides - Video/Audio AI

Guides - Finetuning

Misc

Why GraphQL?

Querying VLMs with GraphQL

Benefits of GraphQL

A Concrete End-to-End Example

Extracting Nested Fields with GraphQL

Try our Document -> JSON API today

Get Started

Capabilities

Guides - Doc AI

Guides - Image AI

Guides - Video/Audio AI

Guides - Finetuning

Misc

​Why GraphQL?

​Querying VLMs with GraphQL

​Benefits of GraphQL

​A Concrete End-to-End Example

​Extracting Nested Fields with GraphQL

​Try our Document -> JSON API today

Why GraphQL?

Querying VLMs with GraphQL

Benefits of GraphQL

A Concrete End-to-End Example

Extracting Nested Fields with GraphQL

Try our Document -> JSON API today