Skip to main content
Artifacts are binary objects generated during agent interactions, such as images, videos, audio files, and documents. When agents perform operations like image generation, face blurring, video trimming, or document processing, the results are stored as artifacts that can be retrieved using object references.

Object References

Agent responses return object references (refs) instead of raw binary data. Each reference is a string identifier that follows a specific format: a 3-5 letter type prefix followed by an underscore and a 6-digit hexadecimal string (e.g., img_a1b2c3).
Artifact TypePrefixReference TypePython Return Type
Imageimg_ImageRefPIL.Image.Image
Videovid_VideoRefPath (mp4)
Audioaud_AudioRefPath (mp3)
Documentdoc_DocumentRefPath (pdf)
Reconstructionrecon_ReconRefPath (spz)
URLurl_UrlRefPath (any of the above)
Arrayarr_ArrayRefnp.ndarray
Import reference types from the SDK:
from vlmrun.types import ImageRef, VideoRef, AudioRef, DocumentRef, ReconRef, UrlRef

Retrieving an Artifact

In a Chat Completion

To retrieve a chat completion artifact, use the session_id from the chat response and the object_id (returned as a Ref type) from the JSON result.
from pathlib import Path
from pydantic import BaseModel, Field
from PIL import Image
from vlmrun.client import VLMRun
from vlmrun.types import ImageRef

client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")

# Define a response model with an ImageRef field
class BlurredImageResponse(BaseModel):
    image: ImageRef = Field(..., description="The blurred image")

# Make a chat completion request
response = client.agent.completions.create(
    model="vlmrun-orion-1:auto",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Blur all the faces in this image"},
                {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
            ]
        }
    ],
    response_format={
        "type": "json_schema",
        "schema": BlurredImageResponse.model_json_schema()
    }
)

# Parse the response
result = BlurredImageResponse.model_validate_json(response.choices[0].message.content)

# Retrieve the artifact using session_id and object_id
blurred_image: Image.Image = client.artifacts.get(
    session_id=response.session_id,
    object_id=result.image.id
)

# Display or save the image
blurred_image.save("blurred_output.jpg")

In an Agent Execution

To retrieve an artifact for a specific agent execution, use the execution_id from the agent execution response and the object_id (returned as a Ref type) from the JSON result.
Python
from pydantic import BaseModel, Field
from PIL import Image

from vlmrun.client import VLMRun
from vlmrun.client.types import AgentExecutionResponse, AgentExecutionConfig, ImageUrl
from vlmrun.types import ImageRef, MessageContent

client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")

# Define typed inputs using MessageContent
class ExecutionInputs(BaseModel):
    image: MessageContent = Field(..., description="The input image")

class ImageResponse(BaseModel):
    image: ImageRef = Field(..., description="The processed image")

# Execute an agent with typed inputs
execution: AgentExecutionResponse = client.agent.execute(
    name="image/blur-image",
    inputs=ExecutionInputs(
        image=MessageContent(type="image_url", image_url=ImageUrl(url="https://example.com/photo.jpg"))
    ),
    config=AgentExecutionConfig(
        prompt="Blur the entire image",
        response_model=ImageResponse
    )
)

# Wait for completion
execution = client.executions.wait(execution.id, timeout=180)

# Parse the response and retrieve the artifact
result = ImageResponse.model_validate(execution.response)
image: Image.Image = client.artifacts.get(
    execution_id=execution.id,
    object_id=result.image.id
)

Retrieving Multiple Artifacts

Agents can return multiple artifacts in a single response:
Python
from typing import List
from pydantic import BaseModel, Field
from PIL import Image
from vlmrun.client import VLMRun
from vlmrun.types import ImageRef

client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")

class VideoFramesResponse(BaseModel):
    class VideoFrame(BaseModel):
        image: ImageRef = Field(..., description="The video frame image")
        timestamp: str = Field(..., description="Timestamp in HH:MM:SS format")
        description: str = Field(..., description="Description of the scene")

    frames: List[VideoFrame] = Field(..., description="Extracted frames")

response = client.agent.completions.create(
    model="vlmrun-orion-1:auto",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Extract 5 key frames from this video"},
                {"type": "video_url", "video_url": {"url": "https://example.com/video.mp4"}}
            ]
        }
    ],
    response_format={
        "type": "json_schema",
        "schema": VideoFramesResponse.model_json_schema()
    }
)

result = VideoFramesResponse.model_validate_json(response.choices[0].message.content)

# Retrieve all frame artifacts
frames: List[Image.Image] = [
    client.artifacts.get(session_id=response.session_id, object_id=frame.image.id)
    for frame in result.frames
]

for i, (frame, frame_data) in enumerate(zip(frames, result.frames)):
    print(f"Frame {i+1}: [{frame_data.timestamp}] {frame_data.description}")
    frame.save(f"frame_{i+1}.jpg")

Artifact Caching

The Python and Node SDKs automatically caches downloaded artifacts to avoid re-downloading the same files. Artifacts are stored in ~/.vlmrun/artifacts/{session_id}/ with filenames based on the object ID and appropriate file extension.
Python
# First call downloads the artifact
video_path = client.artifacts.get(session_id=session_id, object_id="vid_a1b2c3")

# Subsequent calls return the cached file path
video_path = client.artifacts.get(session_id=session_id, object_id="vid_a1b2c3")  # No download

Common Use Cases

Multi-modal Artifacts

Generate multi-modal artifacts such as images and videos.

Multiple Artifacts

Generate multiple images of a scene (e.g. virtual try-on, video thumbnails, etc.).

Document Processing

Redact sensitive information from documents, and return the processed document as a PDF.

3D Reconstruction

Generate 3D models from images or videos, and return ply/spz files.

Best Practices

When working with artifacts, keep these guidelines in mind:
  • For large artifacts like videos, the Python and Node SDKs download files to disk rather than loading them into memory. This prevents memory issues when working with large files. Always check the file size before loading video content into memory.
  • Use structured response models with appropriate Ref types (ImageRef, VideoRef, etc.) to ensure type safety and enable IDE autocompletion. The Python and Node SDKs will automatically handle the conversion to the appropriate Python type when retrieving artifacts.

API Reference

View the complete API reference for artifact retrieval