Handle chat completion requests.
Supports both authenticated and public (unauthenticated) requests. Public requests are rate-limited to 3 calls per hour per fingerprint (IP + user-agent). Authenticated free users are limited to 20 chats per day.
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Request payload for the OpenAI chat completions API for vlm-agent-1
Messages to complete
ID of the completion
VLM Run Agent model to use for completion
vlmrun-orion-1, vlmrun-orion-1:auto, vlmrun-orion-1:fast, vlmrun-orion-1:pro Maximum number of tokens to generate
Number of completions to generate
Temperature of the sampling distribution
Cumulative probability of parameter highest probability vocabulary tokens to keep for nucleus sampling
Number of highest probability vocabulary tokens to keep for top-k-filtering
Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens
Whether to stream the response or not
Whether to generate previews for the response or not
Response format for JSON schema mode as per Fireworks AI specification.
Session UUID for persisting the chat history
Additional metadata for the request (e.g., dataset_name, experiment_id, etc.)
Successful Response