Get Started
Guides - Doc AI
- Parsing Documents
- Classifying Documents
- Applications
- Healthcare
Guides - Image AI
Guides - Video/Audio AI
Misc
- Integrations
- Rate Limits
- Error Codes
- FAQ
- Changelog
Long-context Outputs
Support for long-output contexts for domains like audio/video transcription, exceeding 8K token limits.
Long-context Outputs Demo
Navigate over to the video-transcription playground in our hub to see the long-context outputs in action.
VLM Run provides robust support for processing and extracting structured data from long-context inputs like audio and video files. This capability enables you to work with extended content that would exceed the token limits of many foundation models (typically around 8K tokens).
What are Long-context Outputs?
Long-context outputs refer to VLM Run’s ability to process and extract structured data from extended output content like:
- Transcripts of long-form audio or video content (12+ hours of audio or 4+ hours of video)
- Extracted structured data from multi-page documents (>128 pages)
- Extracted structured data from large collections of related images (>128 images)
This capability is essential for applications that deal with lengthy content such as podcast transcriptions, lecture recordings, interviews, meetings, and extended video analysis.
Using Long-context Processing
You can process long-form audio and video content using the VLM Run API with batch processing enabled:
Long-form Audio / Audio Processing (with batch processing)
batch=True
mode.from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig
# Initialize the client
client = VLMRun(api_key="your-api-key")
# Submit a prediction request with `batch=True`
prediction: PredictionResponse = client.audio.generate(
file=Path("path/to/long_video.mp4"), # Can be up to 4 hours long
domain="video.transcription", # use `audio.transcription` for audio files
batch=True,
config=GenerationConfig(
max_tokens=65_536,
),
)
# You can manually get the prediction by it's ID and check the status of the prediction
# Response status can take the following values: "pending" | "running" | "completed" | "failed"
# while <condition>:
# prediction: PredictionResponse = client.predictions.get(id=prediction.id)
# ...
For batch processing, you are provided with a prediction ID and can check the status of the prediction later. We provide a polling mechanism and some convenience functions to check the status of the prediction and wait for it to complete.
# Wait for the prediction to complete (with a timeout of 600 seconds)
prediction: PredictionResponse = client.predictions.wait(id=prediction.id, timeout=600)
Domain-specific Schemas
VLM Run provides specialized schemas for different types of long-form content:
audio.transcription
: General-purpose audio transcription with speaker detectionvideo.transcription
: General-purpose video transcription with visual scene analysisvideo.transcription-summary
: Summary of a video transcription with key points and speaker analysisvideo.conferencing-summary
: Summary of a video conference with key points and speaker analysisvideo.tv-news-summary
: Summary of a TV news broadcast with anchors, reporters, chyrons, and segmentsvideo.dashcam
: Analysis of a dashcam video with scene analysis and spoken language detection
Refer to the Hub Catalog for more information on the schemas supported by VLM Run.
Example: Transcription of a YC Podcast Episode
Here’s an example of a long-context output for a YC episode on How New Technology Creates New Businesses. As you can see, the output is a list of temporal segments grounded with start and end times, both audio transcription and visual understanding of the content.
{
"metadata": {
"language": null,
"content": null,
"topics": null,
"duration": 488.56
},
"segments": [
{
"start_time": 0,
"end_time": 25.8,
"audio": {
"content": " Like the only way to find these opportunities to learn about them is to find weirdos on the internet that are also into this thing. Yes. And they're figuring it out too. And you can kind of compare notes. Yes. And this is how new industries are created. Literally. By weirdos on the internet. Like literally. Literally. This is Dalton, plus Michael, and today we're going to talk about why AI is going to create more successful founders in the world."
},
"video": {
"content": "Two men are engaged in a conversation at a table. The man on the left, wearing a light gray shirt, is gesturing with his hands as he speaks. The man on the right, dressed in a blue shirt, listens attentively and occasionally responds with hand gestures. They appear to be in a professional setting, possibly an office or conference room, with large windows in the background allowing natural light to fill the space."
}
},
{
"start_time": 25.8,
"end_time": 51.71,
"audio": {
"content": " It's interesting, as we've gotten older, we kind of see a new set of tools come into the market and then an explosion in the number of founders who can now create value. And we've seen this before, right? Like, what was the first time you saw this? I certainly noticed when the internet was new, people that knew how to build websites were suddenly able to make lots of money from"
},
"video": {
"content": "The video features two individuals engaged in a conversation at a table. The person on the left, wearing a light gray shirt, is facing the person on the right, who is dressed in a blue jacket over a black shirt. The background is minimalistic, with a plain wall and a window allowing natural light to enter. The text overlay on the left side of the screen reads \"AI Will Create More Successful Founders\" and \"Founder Explosion.\" On the right side, there is a list titled \"Founder Explosion\" with various items such as \"On The Cusp,\" \"Cost Of Business,\" \"Get In Early,\" \"Whatnot,\" \"Endless Opportunity,\" and \"Internet Weirdos.\" The conversation appears to be focused on the impact of artificial intelligence on business and entrepreneurship."
}
},
{
"start_time": 51.71,
"end_time": 71.89,
"audio": {
"content": " the skill. And it was like really basic stuff. High school kids were making tons of money. Yep. I remember people that could just figure out how to sell stuff on eBay, where you would go buy something cheap but then listed on eBay and arbitrage. Yep. Basically, you would see people that kind of understood the new tooling that came out and would like do a hustle and make ungodly amounts of money."
},
"video": {
"content": "The video features two men engaged in a conversation in an office setting. The man on the left, wearing a light gray button-up shirt, is actively speaking and gesturing with his hands, while the man on the right, dressed in a blue jacket over a black shirt, listens attentively with his arms crossed. The background includes a large window with blinds partially drawn, allowing natural light to filter into the room. The conversation appears to be focused on business-related topics, as indicated by the text on the right side of the screen, which lists various themes such as 'On The Cusp,' 'Cost Of Business,' 'Get In Early,' 'Whatnot,' 'Endless Opportunity,' and 'Internet Weirdos.' The overall atmosphere suggests a professional discussion."
}
},
{
"start_time": 72.01,
"end_time": 92.67,
"audio": {
"content": " Yeah. And it was just because they understood the new tools. And I already wasn't even a hustle. Like it was a good business. Like it was, they saw that tools enabled new businesses. You know, we saw this, you know, tail end of the open source world where like we could build all of Justin TV with free software. Yep."
},
"video": {
"content": "The video features two men engaged in a conversation in an office setting. The man on the left, wearing a light gray shirt, is gesturing animatedly with his hands as he speaks, indicating an active discussion. The man on the right, dressed in a blue jacket over a black shirt, listens attentively with his hands clasped together on the table. The background includes a window with blinds partially open, allowing natural light to filter into the room. On the right side of the screen, there is a vertical list titled \"Founder Explosion\" with various topics such as \"On The Cusp,\" \"Cost Of Business,\" \"Get In Early,\" \"Whatnot,\" \"Endless Opportunity,\" and \"Internet Weirdos.\""
}
},
{
"start_time": 92.67,
"end_time": 112.85,
"audio": {
"content": " And then we were there in the beginning of cloud compute where we didn't have to rack servers anymore. Any kid could sign up for an Amazon account, put a couple bucks down, and get access to a server. And so what's interesting is that we might, I think we feel pretty good about saying this, we might be on"
},
"video": {
"content": "The video features a conversation between two men seated at a table in a modern office setting. The man on the right, wearing a blue shirt and glasses, is speaking animatedly, gesturing with his hands as he discusses various topics related to entrepreneurship and business. The man on the left, dressed in a light-colored shirt, listens attentively, occasionally nodding and responding. The background includes a white wall and a window, suggesting a professional environment. On the right side of the screen, there is a list of topics being discussed, such as 'Founder Explosion,' 'On The Cusp,' 'Cost Of Business,' 'Get In Early,' 'Whatnot,' 'Endless Opportunity,' and 'Internet Weirdos.'"
}
},
{
"start_time": 112.85,
"end_time": 135.69,
"audio": {
"content": " the cusp of the next one of these. And that means there are maybe a whole bunch of new opportunities for successful businesses to be created. Yeah, starting now. Yeah, I mean, here's another metaphor. When the iPhone came out, who would have thought that Flappy Bird would have been created? And I think I read that that guy made like 20 million in cash."
},
"video": {
"content": "The video features two men engaged in a conversation at a table. The man on the left, wearing a light gray shirt, is speaking animatedly, gesturing with his hands as he talks. The man on the right, dressed in a blue jacket over a black shirt, listens attentively, occasionally nodding and responding. The setting appears to be an office or meeting room with large windows in the background, allowing natural light to fill the space. The overall atmosphere suggests a professional discussion or interview."
}
},
{
"start_time": 135.87,
"end_time": 159.75,
"audio": {
"content": " Boom. In like two months and then shut it down. And so if you watch, okay, iPhone, Steve Jobs on stage, some guy in Southeast Asia building Flappy Bird. That's like wild. Never would have guessed. And so, again, to be very direct, what we're arguing is that when brand new technologies come out that are powerful, the people that are on the cusp of understanding them and that quickly"
},
"video": {
"content": "Two men are engaged in a conversation at a table. The man on the left, wearing a light gray shirt, is gesturing animatedly with his hands as he speaks. The man on the right, dressed in a blue jacket, listens attentively, occasionally nodding and smiling. The background features a large window with a view of a body of water, suggesting an indoor setting with natural light."
}
},
{
"start_time": 159.75,
"end_time": 180.77,
"audio": {
"content": " build businesses or build useful things using those tools have a very unique view of creating businesses and wealth. And again, to be on the nose for AI, it seems like you can do things that would require way more headcount than you would otherwise. Yes. And so, you know, we're not even saying we know the ideas."
},
"video": {
"content": "The video features two men engaged in a conversation in an indoor setting. The man on the left, wearing a light gray button-up shirt, is actively gesturing with his hands as he speaks, indicating an animated discussion. The man on the right, dressed in a dark blue shirt, listens attentively, occasionally nodding and responding. The background includes a window with blinds, suggesting a modern office or studio environment. The video also displays a sidebar with various topics such as 'On The Cusp,' 'Cost Of Business,' 'Get In Early,' 'Whatnot,' 'Endless Opportunity,' 'Internet Weirdos,' and 'New Is The Time,' which likely relate to the conversation's themes."
}
},
{
"start_time": 180.97,
"end_time": 201.72,
"audio": {
"content": " No. We're just saying if you're watching this and you're interested in being a founder or maybe not working at a company. Yeah. And you just pay attention to every new thing that comes out and try to find these opportunities or, I don't't know arbitrage is the right word, but no, just you know, new opportunities. New opportunities using these cutting edge tools"
},
"video": {
"content": "The video depicts a conversation between two men seated at a table in an office setting. The man on the left, wearing a light gray shirt, is gesturing animatedly with his hands as he speaks, while the man on the right, dressed in a blue jacket over a black shirt, listens attentively with his hands clasped together. The background features large windows with a view of a cityscape, and the room has a modern, minimalist design with white walls and a light-colored floor. The conversation appears to be focused and engaged, with both individuals actively participating in the dialogue."
}
},
{
"start_time": 201.72,
"end_time": 221.8,
"audio": {
"content": " and you're on the bleeding edge, you're not competing with anyone. No. It's green field. I think what's cool is any time one of these technologies shifts happens, the cost of starting a business, some set of businesses, reduces by up to like 10x. Yep. And so suddenly, businesses that either wouldn't have made sense"
},
"video": {
"content": "The video features two men engaged in a conversation at a table. The man on the left, wearing a light gray shirt, is gesturing with his hands as he speaks, while the man on the right, dressed in a blue jacket over a black shirt, listens attentively. The setting appears to be an office or meeting room with large windows in the background, allowing natural light to fill the space. The conversation seems to revolve around business topics, as indicated by the text on the right side of the screen, which includes phrases like 'Cost Of Business,' 'Get In Early,' 'Whatnot,' 'Endless Opportunity,' 'Internet Weirdos,' and 'Now Is The Time.' The overall atmosphere suggests a professional discussion."
}
},
{
"start_time": 221.8,
"end_time": 244.96,
"audio": {
"content": " or certainly a normal person couldn't just stand up and do, right? Like can you imagine just, oh, it's pre online selling in eBay. All you have to do is rent a storefront and run a store, right? Like that's cheap, right? Like, absolutely not. Or like pre-Ari-NB. Like, all you have to do is just like buy a house and set up your own air bed and breakfast"
},
"video": {
"content": "The video features two men engaged in a conversation at a table in an office setting. The man on the left, wearing a light gray shirt, listens attentively while the man on the right, dressed in a blue jacket over a black shirt, gestures animatedly as he speaks. The background includes large windows with a view of a body of water, suggesting a modern and open environment. The conversation appears to be focused on business topics, as indicated by the text on the right side of the screen, which lists various themes such as 'Cost Of Business,' 'Get In Early,' 'Whatnot,' 'Endless Opportunity,' 'Internet Weirdos,' and 'Now Is The Time.' The overall atmosphere is professional and collaborative."
}
},
{
"start_time": 244.96,
"end_time": 265.28,
"audio": {
"content": " bed and breakfast thing or even by hotel yeah that's crazy crazy. Whereas like Airbnb can rent a room. And think about it. Your own place. If you saw Airbnb early and you just decided to be a host and be like, oh, I should like do this as a business. You could do it pretty well. You could do it pretty well. Yeah. When Shopify was a brand new thing, like all of these platforms exactly the people that were the first to recognize that these were"
},
"video": {
"content": "The video features two men engaged in a conversation at a table. The man on the left, wearing a light gray shirt, is gesturing with his hands as he speaks, while the man on the right, dressed in a blue jacket over a black shirt, listens attentively. The background is minimalistic, with a plain wall and a window that lets in natural light. On the right side of the screen, there is a list of topics or themes, including \"Cost Of Business,\" \"Get In Early,\" \"Whatnot,\" \"Endless Opportunity,\" \"Internet Weirdos,\" and \"Now Is The Time.\" The overall setting appears to be a casual interview or discussion."
}
},
{
"start_time": 265.84,
"end_time": 286.22,
"audio": {
"content": " gave them leverage yes those entrepreneurial-minded people did really well. Yes. And so I think what's so cool is that what we're saying is like if you're ambitious and you're paying attention, you might not ever need to work at a big company. You might not ever need to have a boss."
},
"video": {
"content": "The video features two men engaged in a conversation at a table. The man on the left, wearing a light gray shirt, is gesturing animatedly with his hands as he speaks. The man on the right, dressed in a blue jacket, listens attentively, occasionally responding with his own gestures. The setting appears to be an office or meeting room with large windows in the background, allowing natural light to fill the space. The conversation seems to revolve around business or startup topics, as indicated by the text on the right side of the screen, which includes phrases like 'Get In Early,' 'Whatnot,' 'Endless Opportunity,' 'Internet Weirdos,' and 'Now Is The Time.' The overall atmosphere suggests a professional and collaborative discussion."
}
},
{
"start_time": 286.22,
"end_time": 307.96,
"audio": {
"content": " Like you can be in control of your own destiny. And these moments don't happen every week. No. Like, we wish we did. It would be the investor. But like, when they do, the people who move. I mean, this is a very specific example. Yeah. But whatnot, the online live shopping thing, I still talk to the founders a lot."
},
"video": {
"content": "Two men are sitting at a table in a modern office setting, engaged in a conversation. The man on the left is wearing a light gray button-up shirt and has short, curly hair. He appears to be listening attentively to the man on the right, who is bald, wearing glasses, and dressed in a blue jacket over a black shirt. The man on the right is gesturing with his hands as he speaks, indicating an animated discussion. The background features large windows with a view of a cityscape, suggesting an urban environment. On the right side of the screen, there is a list of topics or themes related to business and entrepreneurship, such as 'Founder Explosion,' 'On The Cusp,' 'Cost Of Business,' 'Get In Early,' 'Whatnot,' 'Endless Opportunity,' 'Internet Weirdos,' and 'Now Is The Time.'"
}
},
{
"start_time": 308.22,
"end_time": 328,
"audio": {
"content": " And they have, I think, like, high school-aged kids selling stuff on their... Making real money, right? And making just, again, I don't even want to say the numbers. Yeah. But they figured out the format. They understand how to use whatnot. They built a user base there. Yeah. And they're basically... They're making enough money to set themselves up for their entire life."
},
"video": {
"content": "A man with curly hair and a light gray shirt is speaking animatedly to another man who has a shaved head and glasses. The man with curly hair uses expressive hand gestures as he talks, while the other man listens attentively. The background features a window with blinds, and there is a menu or list of topics on the right side of the screen, including \"Founder Explosion,\" \"On The Cusp,\" \"Cost Of Business,\" \"Get In Early,\" \"Whatnot,\" \"Endless Opportunity,\" \"Internet Weirdos,\" and \"Now Is The Time.\""
}
},
{
"start_time": 328.12,
"end_time": 351.76,
"audio": {
"content": " Yeah. By just seeing this new platform, figuring it out, and then making a bet on it. Yes. I mean, this happened with Twitch. Happens with Twitch. Whole new industry, basically. Yeah. No, I think that what's cool is that we're also talking about every scale, right? We're talking about things that can be venture backed, maybe billion dollar companies one day. But we're also talking about things that can just"
},
"video": {
"content": "The video features two men engaged in a conversation at a table. The man on the left, wearing a light gray shirt, is gesturing with his hands as he speaks, while the man on the right, dressed in a blue jacket over a black shirt, listens attentively with his hands clasped together. The background includes a large window with a view of a body of water, suggesting an indoor setting with natural light. The conversation appears to be casual and focused, with both individuals actively participating in the dialogue."
}
},
{
"start_time": 351.76,
"end_time": 372.66,
"audio": {
"content": " set you up so that you can pay your rent and live a good life. Yep. The opportunities are across the entire spectrum. And I think that's what's really cool about new technology. Like when there's a real technology shift, it affects businesses across the board. We're not just talking about companies that YCP even fun. I think that the last point I'd want to make"
},
"video": {
"content": "A man in a blue shirt is seated at a table, engaged in a conversation with another person whose back is facing the camera. The man in the blue shirt is gesturing with his hands as he speaks, indicating an animated discussion. The setting appears to be an office or a professional environment, with a window and some furniture visible in the background. The overall atmosphere suggests a serious and focused conversation."
}
},
{
"start_time": 372.66,
"end_time": 398.57,
"audio": {
"content": " on this front is that you don't get this opportunity if you're just thinking, you gotta actually do. Well, and they won't teach you this in schools. Schools teach you stuff for 10 or 20 years ago. So the other thing that I've noticed in these trends is that when you are part of the history being made and you're this early on the cutting edge of a new tech coming out, you can't expect your university or your teachers or your peers people in your community"
},
"video": {
"content": "Two men are engaged in a conversation at a table in an office setting. The man on the left, wearing a light gray shirt, is gesturing with his hands as he speaks, while the man on the right, dressed in a blue jacket over a black shirt, listens attentively. The background features large windows with a view of a cityscape, and the room has a modern, minimalist design with white walls and a light-colored floor. The conversation appears to be focused and serious, with both individuals maintaining eye contact and using expressive hand movements."
}
},
{
"start_time": 398.57,
"end_time": 419.33,
"audio": {
"content": " or your peers to teach you about it. It's only basically weirdos on the internet. Yes. Like the only way to find these opportunities to learn about them is to find weirdos on the internet that are also into this thing. Yes. And they're figuring it out too. that are also into this thing. And they're figuring it out too. And you can kind of compare notes. Yes. And this is how new industries are created. Literally. By weirdos on the internet. Like literally. Literally. By weirdos on the internet. Like literally. Literally, there's like some subreddit with a bunch of weirdos."
},
"video": {
"content": "The video features two men engaged in a conversation at a table. The man on the left, wearing a light gray shirt, is animatedly gesturing with his hands as he speaks, while the man on the right, dressed in a blue jacket, listens attentively with his hands clasped together. The setting appears to be a modern office or conference room with large windows in the background, allowing natural light to fill the space. The conversation seems to be casual and friendly, with both individuals appearing relaxed and engaged."
}
},
{
"start_time": 419.33,
"end_time": 447.2,
"audio": {
"content": " And like someday from now, you know, 10 years from now, there'll be an entire industry of people that learned about this thing in some subred somewhere there. Yeah, no, I totally agree. So hey, the big takeaway is if you've been wrestling your lawyers, if you thought, oh, this isn't the time to start a new business. Maybe you should reconsider. Yeah. This is a very interesting time. I think the final argument is there's a good case where a smaller percentage of the population will need to get jobs,"
},
"video": {
"content": "The video features two men engaged in a conversation in an office setting. The man on the left, wearing a light gray shirt, is animatedly speaking and using hand gestures to emphasize his points. He appears to be explaining something with enthusiasm. The man on the right, dressed in a blue jacket over a black shirt, listens attentively, occasionally nodding and responding. The background includes a window with blinds, suggesting a modern office environment. The conversation seems to revolve around business or technology topics, as indicated by the text overlays such as 'Cost Of Business' and 'Endless Opportunity.'"
}
},
{
"start_time": 447.26,
"end_time": 469.08,
"audio": {
"content": " and more people will be able to use tools like this to be self-employed in some way. I don't think there's any, I think all the structural changes imply that more folks will just use their highly leveraged selves using all these tools to run businesses, then have to go get a job. Yeah. Right? And I think that story isn't told, right? I think the story is always this kind of depressing story of like,"
},
"video": {
"content": "The video features two men engaged in a conversation in an indoor setting. The man on the left, wearing a light gray button-up shirt, is actively gesturing with his hands as he speaks, indicating an animated discussion. The man on the right, dressed in a blue jacket over a black shirt, listens attentively, occasionally nodding and responding. The background includes a window with blinds, suggesting a modern office or conference room environment. The overall atmosphere appears to be professional and focused."
}
},
{
"start_time": 469.08,
"end_time": 488.56,
"audio": {
"content": " oh, maybe you won't need it, you won't be needed anymore, as opposed to here's a set of tools. You could do things that people couldn't think of doing affordably before. Like you could be your own boss. You don't even need to be inside of a company to create value. Yeah. So anyways, hopefully that's inspiring. Good shot. Thanks. good shot thanks"
},
"video": {
"content": "Two men are engaged in a conversation at a table in an office setting. The man on the left, wearing a light gray shirt, listens attentively while the man on the right, dressed in a blue jacket over a black shirt, speaks animatedly. He uses hand gestures to emphasize his points, occasionally clasping his hands together on the table. The background features large windows with a view of a cloudy sky, and the room is well-lit with natural light."
}
}
]
}
Use Cases
- Content Search: Make audio/video content searchable through transcription
- Meeting Intelligence: Extract action items and key points from meeting recordings
- Media Monitoring: Analyze news broadcasts and identify topics and speakers
- Educational Content: Structure course lectures with chapters and topics
- Podcast Production: Generate show notes, summaries, and topic timestamps
By leveraging VLM Run’s long-context output capabilities, you can efficiently extract structured information from extended audio and video content that would otherwise exceed traditional token limits.
Try our Video / Audio -> JSON API today
Head over to our Video -> JSON or Audio -> JSON to start building your own video/audio processing pipelines with VLM Run. Sign-up for access on our platform.