Transcribing Audio
Learn how to transcribe and analyze long-form audio.
While traditional speech-to-text models focus solely on transcription, vlm-1
can simultaneously generate rich, structured insights from audio content. This includes transcription, chapter segmentation, topic extraction, entity recognition, and sentiment analysis - all in a single API call. These capabilities are particularly valuable for podcast analysis, interview processing, and content management systems. In this guide, we’ll walk you through how to use the audio.transcription
domain to transcribe and analyze long-form audio content.
In subsequent guides, we’ll cover more advanced capabilities like topic extraction, entity recognition, and sentiment analysis.
Audio Transcription Demo
Navigate over to the audio-transcription playground in our hub to see the audio transcription in action.
Analyzing Podcast Episodes
Let’s look at a podcast analysis example to see how vlm-1
can be used to extract structured insights from audio content. In this example, we’ll use vlm-1
to transcribe and analyze a podcast episode, generating segmented chapters with start and end timestamps, and corresponding full transcript that can be used for content organization and discovery.
Understanding the Output
Here’s an example of the output from the audio.transcription
domain:
Let’s break down the output into its key components:
segments
: Segmented audio content with start and end timestamps.segment.start_time
: The start time of the segment in seconds (relative to the start of the audio).segment.end_time
: The end time of the segment in seconds (relative to the start of the audio).segment.content
: The raw transcription of the spoken content.metadata.duration
: The total duration of the audio in seconds.
Key Features
- Temporal Grounding: Precise time segmentation and content localization
- Long-form Support: Process audio up to 12+ hours with automatic segmentation
- Batch Processing: Efficient handling of large audio collections
Get Started with our Audio -> JSON API
Head over to our Audio -> JSON to start building your own audio processing pipeline with VLM Run. Sign-up for access on our platform.