VLM-1 can be used to extract rich insights from audio podcasts and interviews. The API can be used to transcribe the audio content, and extract structured metadata such as topics, entities, and sentiment. This can be useful for content analysis, search, and recommendation systems.

Notebook Example

If you want to simply look at the code, skip to the colab notebook link directly here.

Key Features

In the sections below, we’ll showcase a few notable features of the API for analyzing podcasts or audio interviews.

1. Automatic Chaptering and Topic Extraction

VLM-1 can automatically generate chapter summaries for audio podcasts or interviews. This can be useful for creating a table of contents for the audio, or for generating a summary of the key points discussed in the podcast. As you can see in the sample output below, the API is able to extract and segment the audio transcription with relevant timestamps, and automatically extract the topics discussed in each individual segment. With this simplified API, the entire episode can be segmented into different chapters based on the topics discussed, and can even be used to automatically generate catchy titles for each segment.

2. Extracting Entities and Sentiment Analysis

VLM-1 can also extract entities and perform sentiment analysis on the audio content. This can be useful for identifying key entities mentioned in each individual segment, and for understanding the sentiment at different points of a conversation. As you can see in the sample output below, the API is able to extract the entities mentioned in the audio content, and provide a sentiment tone prediction (“positive”, “negative”, “neutral”). This can be useful for tracking the sentiment over time, or for identifying key topics of interest and preference in the conversation.