Why GraphQL?
First, if you are not familiar with GraphQL, it is a query language for APIs that allows you to specify exactly which fields you want to extract from a given schema. Instead of always receiving the full JSON response, and post-processing to extract the specific fields you need, you can simply request only the specific data points relevant to your application. There is a similar and direct analog to querying LLMs today, where you can specify the exact fields you want to extract from the LLM’s response in a structured JSON response.Querying VLMs with GraphQL
We simply take this one step further, and enable this same query mechanism for Vision Language Models (VLMs). VLM Run’s GraphQL capability enables you to extract only the specific fields you need from complex schemas, improving efficiency for querying and document ETL processes. This powerful feature allows you to precisely control what data is extracted, minimizing server-side processing overhead of extracting unncessary details, and simultaneously reducing the amount of data transferred over the network, providing a much more efficient and scalable way to extract data (i.e. ETL) from complex unstructured data.Benefits of GraphQL
- Improved Performance: Extract only the data fields you need (unlike OCR-based methods), reducing server-side computational overhead.
- Reduced Bandwidth: Minimize network traffic by receiving smaller, targeted responses
- Flexible Data Selection: Dynamically adjust which fields to extract based on your needs
- Hierarchical Queries: Select nested fields with intuitive syntax
A Concrete End-to-End Example
Let’s say you have an invoice PDF document that contains a table of data with an extensive list of fields (such as line items, tax, total, etc.). You can see the official schema we use for invoices here. However, for your use case, you only need to extract the most important fields such as:- Invoice Number: The number of the invoice
- Issue Date: The date of the invoice
- Due Date: The due date of the invoice
- Total Amount: The total amount of the invoice
gql_stmt
parameter to the GenerationConfig
object in the generate
method.
Extracting Nested Fields with GraphQL
GraphQL’s hierarchical query structure enables precise extraction of deeply nested fields from complex document schemas. For instance, consider a scenario where you require not only top-level invoice metadata—such asinvoice_number
, issue_date
, and due_date
—but also a specific nested attribute like the postal_code
within the customer_billing_address
object. This can be accomplished with a single, declarative GraphQL query: