Skip to main content
With our new OpenAI-compatible API, you can use the OpenAI Python SDK to interact with VLM Run Agents. This allows developers to trivially switch between OpenAI and VLM Run APIs without having to change any code.
Our VLM Agents are fully compatible with the OpenAI API. Notably, our API also supports a whole range of features with multi-modal data types that OpenAI currently does not support. Our OpenAI-Compatible endpoint is available at https://agent.vlm.run/v1/openai.
In order to use the VLM Run Agents API, you simply need to override the default endpoint and API key when using the OpenAI Python SDK.

OpenAI Client Configuration

Override the default endpoint and API key by initializing the OpenAI client with the following configuration:
import openai

client = openai.OpenAI(
    base_url="https://agent.vlm.run/v1/openai",
    api_key="<VLMRUN_API_KEY>"
)
Alternatively, you can also set the following environment variables to achieve the same effect:
export OPENAI_API_BASE="https://agent.vlm.run/v1/openai"
export OPENAI_API_KEY="<VLMRUN_API_KEY>"  # https://app.vlm.run/

Usage 1: Basic Chat Completion

Once you have set the endpoint and API key, you can use the OpenAI Python SDK as you normally would. Note that the only change required to the client.chat.completions.create method is the extra_body field that allows you to specify the domain and additional request metadata. For example:
!pip install vlmrun

import openai
from vlmrun.common.image import encode_image

# Initialize the OpenAI client
client = openai.OpenAI(
    base_url="https://agent.vlm.run/v1/openai", api_key="<VLMRUN_API_KEY>"
)

# Example: Chat completion with an image input
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_url", "image_url": {"url": encode_image(image), "detail": "auto"}},
    ]}
]

# Perform chat completion
chat_completion = client.chat.completions.create(
    model="vlm-agent-1",
    messages=messages,
    temperature=0,
    extra_body={"session_id": "<session id>"},  # optional session id for persistence
)
print(chat_completion.choices[0].message.content)

Usage 2: Chat Completion with Structured Outputs

!pip install vlmrun

import openai
from pydantic import BaseModel, Field
from vlmrun.common.image import encode_image


class ImageCaption(BaseModel):
    caption: str = Field(..., description="Detailed caption of the scene")
    tags: list[str] = Field(..., description="Tags that describe the image")

# Initialize the OpenAI client
client = openai.OpenAI(
    base_url="https://agent.vlm.run/v1/openai", api_key="<VLMRUN_API_KEY>"
)

# Example: Chat completion with an image input
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_url", "image_url": {"url": encode_image(image), "detail": "auto"}},
    ]}
]

# Perform chat completion (with JSON Schema)
chat_completion = client.chat.completions.create(
    model="vlm-agent-1",
    messages=messages,
    response_format={"type": "json_schema", "schema": ImageCaption.model_json_schema()},
)
print(chat_completion.choices[0].message.content)
>> {"caption": "...", "tags": [...]}
print(ImageCaption.model_validate_json(chat_completion.choices[0].message.content))
>> ImageCaption(caption="...", tags=[...])

Usage 3: Basic Chat Completion with Streaming

# Same as above
# ...

# Perform chat completion (with streaming)
chat_completion = client.chat.completions.create(
    model="vlm-agent-1",
    messages=messages,
    temperature=0,
    stream=True,
)
for chunk in chat_completion:
    print(chunk.choices[0].delta.content, end="", flush=True)

Usage 4: Mixed-Modality Inputs

!pip install vlmrun

import openai
from vlmrun.common.image import encode_image

# Initialize the OpenAI client
client = openai.OpenAI(
    base_url="https://agent.vlm.run/v1/openai", api_key="<VLMRUN_API_KEY>"
)

# Upload an image and document to the VLM Run Agents API
file1 = client.files.upload(file=Path("image.jpg"), purpose="assistants")
file2 = client.files.upload(file=Path("document.pdf"), purpose="assistants")

# Perform chat completion (with mixed-modality inputs)
chat_completion = client.chat.completions.create(
    model="vlm-agent-1",
    messages=messages,
    messages=[
        {"role": "user", "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "input_file", "file_id": file1.id},
            {"type": "input_file", "file_id": file2.id},
        ]}
    ],
)

Extra Body

The extra_body field allows you to specify additional request metadata that is used by the VLM Run Agents API (outside of the OpenAI Python SDK), as indicated by the vlmrun field. This metadata is used to specify other request metadata such as allow_training, environment etc. For example, the following code specifies the request metadata:
chat_completion = client.chat.completions.create(
    model="vlm-agent-1",
    messages=messages,
    temperature=0,
    extra_body={
        "vlmrun": {
            "metadata": { # optional dictionary of request metadata
                "environment": "dev",
                "allow_retention": False,
            },
        }
    }
)

Request Metadata

For more details on the request metadata, please refer to the Request Metadata section of the API reference.
Currently, the VLM Run Agents API supports submitting request metadata along with the chat completions request via the extra_body keyword argument. For example, the VLM Run Agents API accepts the following request metadata:
The VLM Agents API supports the following request vlmrun.metadata fields.
  • environment (dev, staging, prod): This property specifies the environment in which the request is being made. This can be useful for tracking requests across different environments. By default, this property is set to prod.
  • session_id: This property is a string identifier for the session, which can be used to track requests across different sessions.
  • allow_training: This property flags the request as a potential candidate for our training dataset. If set to true, the request may be used for training our base models. If set to false, the request will be used for inference only. By default, this property is set to true.
  • allow_retention: This property flags the request as a potential candidate for our retention dataset. If set to true, the request may be used for retention of the data. If set to false, the request will be used for inference only. By default, this property is set to true.
  • allow_logging: This property flags the request as a potential candidate for our logging dataset. If set to true, the request may be used for logging of the data. If set to false, the request will be used for inference only. By default, this property is set to true.
  • extra: This property is a dictionary of extra metadata that can be used to track the request.
chat_completion = client.chat.completions.create(
    model="vlm-1",
    messages=messages,
    temperature=0,
    extra_body={
        "vlmrun": {
            "domain": "...",
            "metadata": {
                "environment": "dev", # "dev", "staging", "prod"
                "session_id": "...", # a string identifier for the session, non-unique
                "allow_training": False, # if true, the request may be used for training our base models
            }
        }
    }
)

Token Usage

The OpenAI Python SDK provides usage statistics for your account on every API call. This can be useful for monitoring your usage and costs when using the API. We refer the user to the VLM Run Pricing page for more information on pricing and usage.

Compatibility Differences

Unlike the OpenAI API, the VLM Run Agents API adds support to the following fields:
  • messages can now contain input_file objects {"type": "input_file", "file_id": "<file_id>"} where file_id is the id of the file uploaded to the VLM Run Agents API. These are especially useful for processing large files such as videos, images, etc.
  • max_tokens: The max_tokens field in chat.completions.create is currently not respected by our server. This means that in case the token outputs exceed the limit, the server will still return the full output.
  • logprobs, logit_bias, top_logprobs, presence_penalty, frequency_penalty, n, stream, stop: These fields are not currently supported by the VLM Run Agents API. We will be adding support for these features in the near future.
I