Skip to main content
Artifacts are binary objects generated during agent interactions, such as images, videos, audio files, and documents. When agents perform operations like image generation, face blurring, video trimming, or document processing, the results are stored as artifacts that can be retrieved using object references.

Object References

Agent responses return object references (refs) instead of raw binary data. Each reference is a string identifier that follows a specific format: a 3-5 letter type prefix followed by an underscore and a 6-digit hexadecimal string (e.g., img_a1b2c3).
Artifact TypePrefixReference TypePython Return Type
Imageimg_ImageRefPIL.Image.Image
Videovid_VideoRefPath (mp4)
Audioaud_AudioRefPath (mp3)
Documentdoc_DocumentRefPath (pdf)
Reconstructionrecon_ReconRefPath (spz)
URLurl_UrlRefPath (any of the above)
Arrayarr_ArrayRefnp.ndarray
Import reference types from the SDK:
from vlmrun.types import ImageRef, VideoRef, AudioRef, DocumentRef, ReconRef, UrlRef

Chat Completions Artifacts

Chat completions artifacts are scoped to a session and retrieved using the session_id returned from chat completions along with the object_id from the response.

Retrieving an Artifact

In order to retrieve an artifact for a specific chat completion, you need to be able to reference the session_id (from the chat completion response) and object_id (from the structured JSON response - returned as Ref types).
from pathlib import Path
from pydantic import BaseModel, Field
from PIL import Image
from vlmrun.client import VLMRun
from vlmrun.types import ImageRef

client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")

# Define a response model with an ImageRef field
class BlurredImageResponse(BaseModel):
    image: ImageRef = Field(..., description="The blurred image")

# Make a chat completion request
response = client.agent.completions.create(
    model="vlmrun-orion-1:auto",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Blur all the faces in this image"},
                {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
            ]
        }
    ],
    response_format={
        "type": "json_schema",
        "schema": BlurredImageResponse.model_json_schema()
    }
)

# Parse the response
result = BlurredImageResponse.model_validate_json(response.choices[0].message.content)

# Retrieve the artifact using session_id and object_id
blurred_image: Image.Image = client.artifacts.get(
    session_id=response.session_id,
    object_id=result.image.id
)

# Display or save the image
blurred_image.save("blurred_output.jpg")

Handling Multi-modal Artifacts

Multi-modal artifacts such as videos are downloaded and cached locally as files.
For now, video artifacts can either be a VideoRef or a UrlRef (specified via VideoRef | UrlRef). We are working on supporting more artifact types in the future.
Python SDK
from pathlib import Path
from pydantic import BaseModel, Field
from vlmrun.client import VLMRun
from vlmrun.types import VideoRef, UrlRef

client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")

class VideoTrimResponse(BaseModel):
    start_time: str = Field(..., description="Start time of the trimmed segment")
    end_time: str = Field(..., description="End time of the trimmed segment")
    video: VideoRef | UrlRef = Field(..., description="The trimmed video")

response = client.agent.completions.create(
    model="vlmrun-orion-1:auto",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Trim this video to the first 10 seconds"},
                {"type": "video_url", "video_url": {"url": "https://example.com/video.mp4"}}
            ]
        }
    ],
    response_format={
        "type": "json_schema",
        "schema": VideoTrimResponse.model_json_schema()
    }
)

result = VideoTrimResponse.model_validate_json(response.choices[0].message.content)

# Retrieve the video artifact - returns a Path to the local file
video_path: Path = client.artifacts.get(
    session_id=response.session_id,
    object_id=result.video.id
)

print(f"Video saved to: {video_path}")
print(f"File size: {video_path.stat().st_size / 1024 / 1024:.2f} MB")

Handling Multiple Artifacts

Agents can return multiple artifacts in a single response:
Python SDK
from typing import List
from pydantic import BaseModel, Field
from PIL import Image
from vlmrun.client import VLMRun
from vlmrun.types import ImageRef

client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")

class VideoFramesResponse(BaseModel):
    class VideoFrame(BaseModel):
        image: ImageRef = Field(..., description="The video frame image")
        timestamp: str = Field(..., description="Timestamp in HH:MM:SS format")
        description: str = Field(..., description="Description of the scene")

    frames: List[VideoFrame] = Field(..., description="Extracted frames")

response = client.agent.completions.create(
    model="vlmrun-orion-1:auto",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Extract 5 key frames from this video"},
                {"type": "video_url", "video_url": {"url": "https://example.com/video.mp4"}}
            ]
        }
    ],
    response_format={
        "type": "json_schema",
        "schema": VideoFramesResponse.model_json_schema()
    }
)

result = VideoFramesResponse.model_validate_json(response.choices[0].message.content)

# Retrieve all frame artifacts
frames: List[Image.Image] = [
    client.artifacts.get(session_id=response.session_id, object_id=frame.image.id)
    for frame in result.frames
]

for i, (frame, frame_data) in enumerate(zip(frames, result.frames)):
    print(f"Frame {i+1}: [{frame_data.timestamp}] {frame_data.description}")
    frame.save(f"frame_{i+1}.jpg")

Agent Executions Artifacts

In order to retrieve an artifact for a specific agent execution, you need to be able to reference the execution_id (from the agent execution response) and object_id (from the structured JSON response - returned as Ref types).
Python SDK
from pydantic import BaseModel, Field
from PIL import Image

from vlmrun.client import VLMRun
from vlmrun.client.types import AgentExecutionResponse, AgentExecutionConfig, ImageUrl
from vlmrun.types import ImageRef, MessageContent

client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")

# Define typed inputs using MessageContent
class ExecutionInputs(BaseModel):
    image: MessageContent = Field(..., description="The input image")

class ImageResponse(BaseModel):
    image: ImageRef = Field(..., description="The processed image")

# Execute an agent with typed inputs
execution: AgentExecutionResponse = client.agent.execute(
    name="image/blur-image",
    inputs=ExecutionInputs(
        image=MessageContent(type="image_url", image_url=ImageUrl(url="https://example.com/photo.jpg"))
    ),
    config=AgentExecutionConfig(
        prompt="Blur the entire image",
        response_model=ImageResponse
    )
)

# Wait for completion
execution = client.executions.wait(execution.id, timeout=180)

# Parse the response and retrieve the artifact
result = ImageResponse.model_validate(execution.response)
image: Image.Image = client.artifacts.get(
    execution_id=execution.id,
    object_id=result.image.id
)
Agent execution artifact retrieval via execution_id is being rolled out. Check the SDK release notes for availability.

Artifact Caching

The Python and Node SDKs automatically caches downloaded artifacts to avoid re-downloading the same files. Artifacts are stored in ~/.vlmrun/artifacts/{session_id}/ with filenames based on the object ID and appropriate file extension.
# First call downloads the artifact
video_path = client.artifacts.get(session_id=session_id, object_id="vid_a1b2c3")

# Subsequent calls return the cached file path
video_path = client.artifacts.get(session_id=session_id, object_id="vid_a1b2c3")  # No download

Common Use Cases

Multi-modal Artifacts

Generate multi-modal artifacts such as images and videos.

Multiple Artifacts

Generate multiple images of a scene (e.g. virtual try-on, video thumbnails, etc.).

Document Processing

Redact sensitive information from documents, and return the processed document as a PDF.

3D Reconstruction

Generate 3D models from images or videos, and return ply/spz files.

Best Practices

When working with artifacts, keep these guidelines in mind:
  • For large artifacts like videos, the Python and Node SDKs download files to disk rather than loading them into memory. This prevents memory issues when working with large files. Always check the file size before loading video content into memory.
  • Use structured response models with appropriate Ref types (ImageRef, VideoRef, etc.) to ensure type safety and enable IDE autocompletion. The Python and Node SDKs will automatically handle the conversion to the appropriate Python type when retrieving artifacts.

API Reference

View the complete API reference for artifact retrieval