Build intelligent conversations with video content using Rich Transcript Collections and grounded AI responses
Video contains a wealth of conversational potential, but extracting meaningful insights through natural language queries requires sophisticated understanding of both spoken content and visual context. While basic transcription gives you raw text, chat completions with Rich Transcript Collections allow you to have intelligent conversations with your video content.
With Cloudglue’s Chat Completion API and Rich Transcript Collections, you can ask natural language questions about your videos and receive contextually accurate responses grounded in the actual video content—including speech, visual scenes, and on-screen text.
Chat completions work with both locally uploaded videos and YouTube content, with multimodal understanding available for uploaded files.
Rich Transcript Collections are specialized collections that combine multiple layers of video understanding:
Unlike basic transcription, Rich Transcript Collections create a searchable knowledge base that enables semantic queries across all modalities of your video content.
Chat completions with Rich Transcript Collections are ideal when you need to:
The messages
array follows the standard chat completion format:
role
: “user”, “assistant”, or “system”content
: The message contentFor multi-turn conversations, include the full conversation history:
Cloudglue’s nimbus-001
is a specialized model optimized for:
The force_search
parameter controls whether the system searches your collections:
true
: Always searches collections before responding (recommended)false
: May respond from general knowledge without searchingCitations provide transparency and verifiability:
true
: Returns timestamps, file references, and content snippetsfalse
: Returns only the response textFor precise control over what content is searched, you can use metadata filters to target specific videos in your collection:
The filter
parameter allows you to constrain searches using file metadata:
path: JSON path to the metadata field (e.g., "metadata.custom_field"
or "video_info.has_audio"
)
operator: Comparison operator to apply
Equal
/ NotEqual
: Exact match comparisonLessThan
/ GreaterThan
: Numeric comparisonIn
: Check if value is in a comma-separated listContainsAny
/ ContainsAll
: Array operations (use valueTextArray
)valueText: Single value for scalar comparisons
valueTextArray: Array of values for array operations
Example metadata filter scenarios:
Let’s build a comprehensive example using cooking videos to demonstrate multi-turn conversations and advanced features.
Now let’s demonstrate a realistic conversation about pasta recipes. This example shows how to build a stateful chatbot that maintains conversation history and can handle follow-up questions. The implementation demonstrates proper conversation state management, error handling, and how to structure questions for optimal results from the nimbus-001 model.
Key features demonstrated in this example:
Here’s what a realistic conversation might look like:
You can guide the model’s behavior using system messages:
For complex queries, use metadata filters to target specific videos in your collection:
When you set include_citations: true
, the response includes detailed references to the specific video segments that informed the answer. This provides transparency and allows users to verify information or explore the original content.
Let’s examine what a real citation response looks like for the question “What cooking techniques are used for pasta dishes?”:
Each citation provides detailed information about the source:
Citations enable powerful functionality in your applications:
This citation system ensures that every response can be traced back to its original source, providing transparency and enabling users to explore the full context of the information.
Ready to start building conversational video experiences? Check out our Chat Completion API to get started with Rich Transcript Collections.
Experiment with different question types:
Get started on our platform and create your first Rich Transcript Collection today.
For production applications, consider implementing:
Build intelligent conversations with video content using Rich Transcript Collections and grounded AI responses
Video contains a wealth of conversational potential, but extracting meaningful insights through natural language queries requires sophisticated understanding of both spoken content and visual context. While basic transcription gives you raw text, chat completions with Rich Transcript Collections allow you to have intelligent conversations with your video content.
With Cloudglue’s Chat Completion API and Rich Transcript Collections, you can ask natural language questions about your videos and receive contextually accurate responses grounded in the actual video content—including speech, visual scenes, and on-screen text.
Chat completions work with both locally uploaded videos and YouTube content, with multimodal understanding available for uploaded files.
Rich Transcript Collections are specialized collections that combine multiple layers of video understanding:
Unlike basic transcription, Rich Transcript Collections create a searchable knowledge base that enables semantic queries across all modalities of your video content.
Chat completions with Rich Transcript Collections are ideal when you need to:
The messages
array follows the standard chat completion format:
role
: “user”, “assistant”, or “system”content
: The message contentFor multi-turn conversations, include the full conversation history:
Cloudglue’s nimbus-001
is a specialized model optimized for:
The force_search
parameter controls whether the system searches your collections:
true
: Always searches collections before responding (recommended)false
: May respond from general knowledge without searchingCitations provide transparency and verifiability:
true
: Returns timestamps, file references, and content snippetsfalse
: Returns only the response textFor precise control over what content is searched, you can use metadata filters to target specific videos in your collection:
The filter
parameter allows you to constrain searches using file metadata:
path: JSON path to the metadata field (e.g., "metadata.custom_field"
or "video_info.has_audio"
)
operator: Comparison operator to apply
Equal
/ NotEqual
: Exact match comparisonLessThan
/ GreaterThan
: Numeric comparisonIn
: Check if value is in a comma-separated listContainsAny
/ ContainsAll
: Array operations (use valueTextArray
)valueText: Single value for scalar comparisons
valueTextArray: Array of values for array operations
Example metadata filter scenarios:
Let’s build a comprehensive example using cooking videos to demonstrate multi-turn conversations and advanced features.
Now let’s demonstrate a realistic conversation about pasta recipes. This example shows how to build a stateful chatbot that maintains conversation history and can handle follow-up questions. The implementation demonstrates proper conversation state management, error handling, and how to structure questions for optimal results from the nimbus-001 model.
Key features demonstrated in this example:
Here’s what a realistic conversation might look like:
You can guide the model’s behavior using system messages:
For complex queries, use metadata filters to target specific videos in your collection:
When you set include_citations: true
, the response includes detailed references to the specific video segments that informed the answer. This provides transparency and allows users to verify information or explore the original content.
Let’s examine what a real citation response looks like for the question “What cooking techniques are used for pasta dishes?”:
Each citation provides detailed information about the source:
Citations enable powerful functionality in your applications:
This citation system ensures that every response can be traced back to its original source, providing transparency and enabling users to explore the full context of the information.
Ready to start building conversational video experiences? Check out our Chat Completion API to get started with Rich Transcript Collections.
Experiment with different question types:
Get started on our platform and create your first Rich Transcript Collection today.
For production applications, consider implementing: