Handle chat completion requests.
Supports both authenticated and public (unauthenticated) requests. Guest users are limited to 10 chats/day per browser id. Authenticated free users are not subject to a daily chat cap after they sign in.
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Request payload for the OpenAI chat completions API for vlmrun-orion-1
Messages to complete
ID of the completion
VLM Run Agent model to use for completion
vlmrun-orion-1, vlmrun-orion-1:lite, vlmrun-orion-1:auto, vlmrun-orion-1:fast, vlmrun-orion-1:pro Maximum number of tokens to generate
Number of completions to generate
Temperature of the sampling distribution
Cumulative probability of parameter highest probability vocabulary tokens to keep for nucleus sampling
Number of highest probability vocabulary tokens to keep for top-k-filtering
Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens
Whether to stream the response or not
Whether to generate previews for the response or not
Response format for JSON schema mode as per Fireworks AI specification.
Session UUID for persisting the chat history
Additional metadata for the request (e.g., dataset_name, experiment_id, etc.)
List of agent skills to enable for this request.
List of tool categories to enable for this request. Available categories: core, document, image, image-gen, video, viz, web, world-gen. When specified, only tools from these categories will be available. For streaming requests: If None, the router agent automatically selects tools. For non-streaming requests: Defaults to 'core' toolset if not specified.
Available toolsets for agent tool selection.
Each toolset represents a category of related tools that can be enabled together for an agent execution.
core, document, image, image-gen, video, viz, web, world-gen List of model-specific tool providers to enable for this request. Available models: depth-anything-3, google-gemini-3-analysis, google-gemini-3-image, google-gemini-robotics-er, google-veo-3.1, meta-sam2, meta-sam3, meta-sam3d, microsoft-omniparser-v2, nvidia-cosmos-reason-2-8b, qwen-qwen3-vl-8b, vlm-dots-ocr. Multiple models can be selected — their tools are merged and deduplicated. Model tools are added on top of the toolset-selected tools.
Available models for agent tool selection.
Each model represents a specialized capability backed by a specific model deployment. Multiple models can be selected simultaneously — pass a list and the tools are merged and deduplicated.
Usage in vlmrun.yaml::
model: vlmrun-orion-1:auto
toolsets:
- core
- image
models:
- nvidia-cosmos-reason-2-8b
- meta-sam3google-gemini-3-image, google-gemini-3-analysis, google-gemini-robotics-er, google-veo-3.1, microsoft-omniparser-v2, qwen-qwen3-vl-8b, meta-sam2, meta-sam3, meta-sam3d, depth-anything-3, vlm-dots-ocr, nvidia-cosmos-reason-2-8b Successful Response