Skip to main content
vlm-agent-1 can leverage various video-editing tools such as trimming, sampling, and extracting segments from videos. These tools are designed to help you extract key moments from videos, trim videos to specific segments, and sample frames from videos for analysis.

Full video used to demonstrate video tools such as trimming, sampling, and keyframe detection

Example Usage

For most video trimming examples, you can use the Structured Outputs API to ensure that the returned response can be structured with valid video URLs and frame data.

1. Video Frame Sampling

Extract frames at regular intervals or specific timestamps for analysis.
Extract at least 3 frames from the video for thumbnail generation.

Example of 3 frames extracted from the video for thumbnail generation.

import openai
from pydantic import BaseModel, Field
from typing import List

class VideoFrame(BaseModel):
    url: str = Field(..., description="The URL of the extracted frame")
    timestamp: str = Field(..., description="The timestamp of the extracted frame, in HH:MM:SS.MS format")

class VideoSamplingResponse(BaseModel):
    frames: List[VideoFrame] = Field(..., description="List of extracted frames")

# Initialize the client
client = openai.OpenAI(
    base_url="https://agent.vlm.run/v1/openai",
    api_key="<VLMRUN_API_KEY>"
)

# Extract keyframes for thumbnails
response = client.chat.completions.create(
    model="vlm-agent-1",
    messages=[
        {
            "role": "user",
            "content": "Extract keyframes from this video for thumbnail generation, sampling every 5 seconds"
        },
        {
            "role": "video_url",
            "video_url": {"url": "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/video.transcription/bakery.mp4"}
        }
    ],
    response_format={"type": "json_schema", "schema": VideoTrimmingResponse.model_json_schema()}
)

# Print the response
print(response.choices[0].message.content)
>>> {"frames": [{"url": "https://.../frame-1.jpg", "timestamp": "..."}, {"url": "https://.../frame-2.jpg", "timestamp": "00:00:05.000"}, ...]}

# Validate the response
print(VideoSamplingResponse.model_validate_json(response.choices[0].message.content))
>>> VideoSamplingResponse(frames=[{"url": "https://.../frame-1.jpg", "timestamp": "..."}, {"url": "https://.../frame-2.jpg", "timestamp": "00:00:05.000"}, ...])

2. Video Highlight Extraction

Our video agents can extract the best moments from a video, focusing on scoring plays and key actions.
Extract the 3 best moments from this video, including the start and end times of each moment.

Example of 3 video highlight extraction.

import openai
from pydantic import BaseModel, Field
from typing import List

class HighlightVideo(BaseModel):
    start_time: str = Field(..., description="Start time of the segment, in HH:MM:SS.MS format")
    end_time: str = Field(..., description="End time of the segment, in HH:MM:SS.MS format")
    url: str = Field(..., description="The URL of the extracted segment")

class HighlightExtractionResponse(BaseModel):
    segments: List[HighlightVideo] = Field(..., description="List of extracted segments")

# Initialize the client
client = openai.OpenAI(
    base_url="https://agent.vlm.run/v1/openai",
    api_key="<VLMRUN_API_KEY>"
)

# Extract multiple segments
response = client.chat.completions.create(
    model="vlm-agent-1",
    messages=[
        {
            "role": "user",
            "content": "Extract the 3 best moments from this video, including the start and end times of each moment."
        },
        {
            "role": "video_url",
            "video_url": {"url": "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/video.transcription/bakery.mp4"}
        }
    ],
    response_format={"type": "json_schema", "schema": MultiSegmentResponse.model_json_schema()}
)

# Print the response
print(response.choices[0].message.content)
>>> {"segments": [...], "total_segments": 5, ...}

# Validate the response
print(MultiSegmentResponse.model_validate_json(response.choices[0].message.content))
>>> MultiSegmentResponse(segments=[...], total_segments=5, ...)

3. Time-Based Trimming

Extract specific segments from videos with precise start and end timestamps.
Trim the video from 10 seconds to 30 seconds

Example of time-based trimming of a 20 second video.

import openai
from pydantic import BaseModel, Field

class VideoResponse(BaseModel):
  start_time: str = Field(..., description="The start time of the trimmed video (HH:MM:SS.MS format)")
  end_time: str = Field(..., description="The end time of the trimmed video (HH:MM:SS.MS format)")
  url: str = Field(..., description="The URL of the trimmed video")

# Initialize the client
client = openai.OpenAI(
    base_url="https://agent.vlm.run/v1/openai",
    api_key="<VLMRUN_API_KEY>"
)

# Trim video and extract frames
response = client.chat.completions.create(
    model="vlm-agent-1",
    messages=[
        {
          "role": "user",
          "content": "Trim the video from 10 seconds to 30 seconds"
        },
        {
          "role": "video_url",
          "video_url": {"url": "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/video.transcription/bakery.mp4"}
        }
    ],
    response_format={"type": "json_schema", "schema": VideoTrimmingResponse.model_json_schema()}
)

# Print the response
print(response.choices[0].message.content)
>>> {"start_time": "00:00:10.000", "end_time": "00:00:30.000", "url": "https://.../trimmed.mp4"}

# Validate the response
print(VideoTrimmingResponse.model_validate_json(response.choices[0].message.content))
>>> VideoTrimmingResponse(start_time="00:00:10.000", end_time="00:00:30.000", url="https://.../trimmed.mp4")

FAQ

  • MP4: Most common format with excellent compatibility
  • MOV: Apple QuickTime format
  • AVI: Windows video format
  • MKV: Matroska video format
  • WebM: Web-optimized format
  • Quality Preservation: Maintains original video quality in trimmed segments
  • Uniform Sampling: Extract frames at regular intervals (e.g., every 1-5 seconds)
  • Keyframe Sampling: Extract only keyframes for efficient analysis
  • Scene-Based: Sample based on scene changes for better content analysis
  • Quality Balance: Choose appropriate sampling rate based on analysis needs
  • Millisecond Precision: Cut videos to exact time ranges with millisecond accuracy
  • Keyframe Alignment: Align cuts to nearest keyframes for clean edits
  • Smart Boundaries: Automatically detect optimal cut points
  • Quality Preservation: Maintain video quality without re-encoding when possible

Try Video Trimming

Experience video trimming and frame sampling with live examples in our interactive chat interface
I