The Basics
Describe Config
The describe config is a JSON object that contains the configuration for the description. It dictates what the description will include from the video/audio file. By default, the description will include the speech, and summary of the video/audio file. However, you can also include visual scene descriptions, and on screen text or captions from the video/audio file.Example Config
Here’s an example of the config you can use to describe a video/audio file.Generating a Description
Example
All you need is a few lines of code to generate a description with Cloudglue.- JavaScript SDK
- Python SDK
Description Outputs
Getting the description is as simple as making one call to get the description.Getting the Description
Examples
Here’s an example of a truncated output you get when you describe a video/audio file.JSON Example
JSON Output
title: The generated title of the video based on the descriptions generated.summary: A generated summary of the video based on the descriptions generated.speech: The speech transcription of the video.visual_scene_description: A description of the scene at different timestamps in the video.audio_description: A description of the audio at different timestamps in the video.scene_text: The on-screen text at different timestamps in the video.
Markdown Example
The following is a truncated example of the markdown output.Markdown Output
Looking for other categories of information from the video? Learn more about
our extraction features and what they can do
for you.
Key Features
- Speech transcription: Speech to text transcription of video/audio files.
- Scene text descriptions: On screen text or captions from a video/audio file at different timestamps.
- Visual scene descriptions: Get descriptions of different scenes in a video, at different timestamps.
- Title + Summary: Get a generated title and summary of the video/audio file based on all the descriptions we have available.
- Markdown compatible: Our descriptions are also able to generated with markdown, so you can use them in LLMs right away.
Try it out
Check out our Describe Video Endpoint to get started with building your own video/audio processing with Cloudglue. Get started on our platform.YouTube
At the moment, if you want to describe a video directly from YouTube, we only
support generating speech transcriptions.