Skip to main content
Foundation vision models support chat over visual inputs, but automation needs reliable, machine-validated output. Agent chat completions let you define the expected structure up front and get consistently formatted JSON back – either loosely with json_object or strictly via json_schema.

Extract Structured JSON with vlm-agent-1

Here’s an example of using the agent chat completions endpoint to extract typed JSON directly from user prompts and files.
from pathlib import Path
from openai import OpenAI

# Initialize the OpenAI client with the custom base URL
client = OpenAI(
    api_key="<VLMRUN_API_KEY>",
    base_url="https://agent.vlm.run/v1/openai"
)

# Ask the agent for structured output using a loose JSON object
response = client.chat.completions.create(
    model="vlm-agent-1:auto",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Extract invoice number, dates, totals, and vendor in JSON."},
                {"type": "image_url", "image_url": {"url": "https://example.com/invoice.jpg"}}
            ]
        }
    ],
    response_format={"type": "json_object"},
)

JSON Response

JSON
{
  "id": "chatcmpl_abc123xyz",
  "object": "chat.completion",
  "model": "vlm-agent-1:auto",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\"invoice_number\":\"INV-2024-001\",\"date\":\"2024-09-15\",\"total_amount\":1250.00,\"vendor_name\":\"Acme Corporation\"}"
      },
      "finish_reason": "stop"
    }
  ]
}

Response Format Types

Similar to OpenAI’s json_object and json_schema response formats, you can use the json_object or json_schema response format types to extract structured JSON from the agent’s response.
TypeDescription
json_objectValid JSON object without specific schema
json_schemaStrict JSON conforming to provided schema

Best Practices

  • Clear system prompts: Define role and output format in the system message
  • Prefer schemas for automation: Use json_schema for guaranteed structure
  • Control randomness: Use lower temperature (0.0–0.3) for deterministic outputs
  • Validate responses: Parse/validate JSON and handle errors gracefully
  • Keep history concise: Shorter message histories improve latency and reliability