The Responses API is Cloudglue’s next-generation conversational video interface. Compatible with the OpenAI Responses API format, it provides richer annotations, built-in multi-turn conversations, system instructions, streaming, and background processing — all grounded in your video content.
The Responses API works with both Media Description Collections (speech, visual, text) and Entity Collections (structured extracted data). Combine both for the richest video understanding.
When to Use Responses API vs Chat Completions
| Feature | Chat Completions | Responses API |
|---|
| Multi-turn conversations | Manual message history | Built-in message array |
| Streaming | Not available | SSE streaming |
| Background processing | Not available | background: true with polling |
| System instructions | System message in array | Dedicated instructions param |
| Entity-backed knowledge | Not available | nimbus-002-preview with entity collections |
| Annotations/Citations | citations array | Rich annotations with timestamps |
| Models | nimbus-001 | nimbus-001, nimbus-002-preview |
| API compatibility | Custom format | OpenAI Responses-compatible |
Use Chat Completions if you have an existing integration and only need basic Q&A over media description collections.
Use Responses API for new projects, streaming UIs, entity-backed reasoning, or when you need background processing.
Model Selection
nimbus-001
Fast general question answering model. Works with media description collections for speech, visual, and text understanding. Good for straightforward Q&A and summarization.
nimbus-002-preview
Light reasoning model capable of multi-step reasoning and inspecting your video assets from different dimensions. In addition to media description collections, it supports entity-backed knowledge — combining structured entity data with unstructured video descriptions for richer, more precise answers.
nimbus-002-preview is a preview model. Behavior may change as we iterate.
When to use which:
- nimbus-001 — Fast Q&A, summarization, general questions over video content
- nimbus-002-preview — Multi-step reasoning, cross-video synthesis, queries that need structured + unstructured data together
Basic Response (Sync)
The simplest usage — send a question and get a complete response.
from cloudglue import CloudGlue
client = CloudGlue()
response = client.responses.create(
input="What techniques are discussed in these videos?",
collections=["COLLECTION_ID"],
model="nimbus-001",
)
print(response.output[0].content[0].text)
import { CloudGlue } from '@aviaryhq/cloudglue-js';
const client = new CloudGlue();
const response = await client.responses.createResponse({
model: 'nimbus-001',
input: 'What techniques are discussed in these videos?',
knowledge_base: { collections: ['COLLECTION_ID'] },
});
console.log(response.output?.[0]?.content?.[0]?.text);
The Python SDK uses a top-level collections parameter, while the TypeScript SDK nests it under knowledge_base: { collections: [...] }. Both achieve the same result — the difference is SDK-specific.
Streaming Responses
For real-time UIs, stream the response as it’s generated via Server-Sent Events (SSE). The JavaScript SDK uses web-standard APIs (fetch + ReadableStream) internally, so streaming works in both Node.js 18+ and modern browsers — including Next.js client components, React apps, and other browser environments.
The stream emits three event types:
response.output_text.delta — Incremental text chunks
response.completed — Final event with the full response object (including annotations)
error — Error event if something goes wrong
from cloudglue import CloudGlue
client = CloudGlue()
events = client.responses.create(
input="What are the key topics in these videos?",
collections=["COLLECTION_ID"],
model="nimbus-002-preview",
stream=True,
)
for event in events:
evt_type = event.get("event")
data = event.get("data")
if evt_type == "response.output_text.delta" and isinstance(data, dict):
print(data.get("delta", ""), end="", flush=True)
elif evt_type == "response.completed":
print() # final newline
elif evt_type == "error":
print(f"Error: {data}")
import { CloudGlue } from '@aviaryhq/cloudglue-js';
const client = new CloudGlue();
const stream = await client.responses.createStreamingResponse({
model: 'nimbus-002-preview',
input: 'What are the key topics in these videos?',
knowledge_base: { collections: ['COLLECTION_ID'] },
});
for await (const event of stream) {
if (event.type === 'response.output_text.delta') {
process.stdout.write(event.delta);
} else if (event.type === 'response.completed') {
console.log('\n');
// Access annotations from the completed response
const annotations = event.response?.output?.[0]?.content?.[0]?.annotations;
if (annotations?.length) {
console.log(`[${annotations.length} citation(s)]`);
}
} else if (event.type === 'error') {
console.error(`Error: ${event.error.message}`);
}
}
Streaming and background mode cannot be used together. Setting both stream: true and background: true returns a 400 error.
Entity-Backed Knowledge
This is the key differentiator of nimbus-002-preview. Entity-backed knowledge lets the model reason over both your structured entity data (extracted schemas) and unstructured media descriptions simultaneously.
For example, if you have an entity collection with extracted recipe schemas (ingredients, cook times, difficulty) and a media description collection with the full video transcripts, the model can answer questions like “Which beginner recipes take under 30 minutes?” by combining both data sources.
How It Works
- Create an entity collection with your extraction schema (see Entity Collections)
- Create a media description collection with your videos
- Pass both to the Responses API via
entity_backed_knowledge configuration
The Python SDK provides helper methods for building entity-backed knowledge configs:from cloudglue import CloudGlue
client = CloudGlue()
# Build entity collection config using helper method
entity_config = client.responses.create_entity_backed_knowledge_config(
entity_collections=[
client.responses.create_entity_collection_config(
collection_id="ENTITY_COLLECTION_ID",
name="recipes",
description="Recipe details including ingredients, cook time, and difficulty",
)
],
description="Cooking videos with structured recipe data",
)
response = client.responses.create(
input="Which recipes require the fewest ingredients?",
collections=["MEDIA_DESCRIPTION_COLLECTION_ID"],
model="nimbus-002-preview",
knowledge_base_type="entity_backed_knowledge",
entity_backed_knowledge_config=entity_config,
)
print(response.output[0].content[0].text)
In TypeScript, pass entity configuration as a raw object in knowledge_base:import { CloudGlue } from '@aviaryhq/cloudglue-js';
const client = new CloudGlue();
const response = await client.responses.createResponse({
model: 'nimbus-002-preview',
input: 'Which recipes require the fewest ingredients?',
knowledge_base: {
type: 'entity_backed_knowledge',
collections: ['MEDIA_DESCRIPTION_COLLECTION_ID'],
entity_backed_knowledge_config: {
description: 'Cooking videos with structured recipe data',
entity_collections: [
{
name: 'recipes',
description: 'Recipe details including ingredients, cook time, and difficulty',
collection_id: 'ENTITY_COLLECTION_ID',
},
],
},
},
});
console.log(response.output?.[0]?.content?.[0]?.text);
Configuration Fields
entity_backed_knowledge_config (top-level)
| Field | Required | Description |
|---|
entity_collections | Yes | Array of entity collection configs (see below) |
description | No | Describes the overall knowledge base — gives the model context on what these collections represent and how they should be used together |
The top-level description on entity_backed_knowledge_config is important for guiding the model’s reasoning. For example, "Sales call recordings from Q4 2024 with deal outcomes and customer feedback" helps the model understand the domain and tailor its analysis. Without it, the model only sees individual collection descriptions and may miss the bigger picture.
Entity collection config (per-collection)
| Field | Required | Description |
|---|
collection_id | Yes | ID of the entity collection |
name | Yes | Short identifier for the collection (e.g., "recipes", "speakers") |
description | Yes | Describes what entities are in this collection — helps the model understand when to use it |
Entity-Backed Knowledge + Streaming
You can combine entity-backed knowledge with streaming for real-time entity-aware responses:
from cloudglue import CloudGlue
client = CloudGlue()
entity_config = client.responses.create_entity_backed_knowledge_config(
entity_collections=[
client.responses.create_entity_collection_config(
collection_id="ENTITY_COLLECTION_ID",
name="recipes",
description="Structured recipe data from cooking videos",
)
],
description="Cooking videos with recipes and techniques",
)
events = client.responses.create(
input="What dishes are mentioned and who discussed them?",
collections=["COLLECTION_ID"],
model="nimbus-002-preview",
stream=True,
knowledge_base_type="entity_backed_knowledge",
entity_backed_knowledge_config=entity_config,
)
for event in events:
evt_type = event.get("event")
data = event.get("data")
if evt_type == "response.output_text.delta" and isinstance(data, dict):
print(data.get("delta", ""), end="", flush=True)
elif evt_type == "response.completed":
print()
import { CloudGlue } from '@aviaryhq/cloudglue-js';
const client = new CloudGlue();
const stream = await client.responses.createStreamingResponse({
model: 'nimbus-002-preview',
input: 'What dishes are mentioned and who discussed them?',
knowledge_base: {
type: 'entity_backed_knowledge',
collections: ['COLLECTION_ID'],
entity_backed_knowledge_config: {
description: 'Cooking videos with recipes and techniques',
entity_collections: [
{
name: 'recipes',
description: 'Structured recipe data from cooking videos',
collection_id: 'ENTITY_COLLECTION_ID',
},
],
},
},
});
for await (const event of stream) {
if (event.type === 'response.output_text.delta') {
process.stdout.write(event.delta);
} else if (event.type === 'response.completed') {
console.log('\n');
}
}
Multi-Turn Conversations
The input parameter accepts either a string (single question) or a message array (multi-turn conversation). For multi-turn, include the full conversation history.
from cloudglue import CloudGlue
client = CloudGlue()
# Turn 1: initial question
resp1 = client.responses.create(
input="What topics are discussed in these videos?",
collections=["COLLECTION_ID"],
model="nimbus-002-preview",
)
turn1_text = resp1.output[0].content[0].text
print("Turn 1:", turn1_text)
# Turn 2: follow-up with conversation history
resp2 = client.responses.create(
input=[
{"role": "user", "content": "What topics are discussed in these videos?"},
{"role": "assistant", "content": turn1_text},
{"role": "user", "content": "Can you go into more detail about the first topic?"},
],
collections=["COLLECTION_ID"],
model="nimbus-002-preview",
)
print("Turn 2:", resp2.output[0].content[0].text)
import { CloudGlue } from '@aviaryhq/cloudglue-js';
const client = new CloudGlue();
// Turn 1: initial question
const turn1 = await client.responses.createResponse({
model: 'nimbus-002-preview',
input: 'What topics are discussed in these videos?',
knowledge_base: { collections: ['COLLECTION_ID'] },
});
const turn1Text = turn1.output?.[0]?.content?.[0]?.text ?? '';
console.log('Turn 1:', turn1Text);
// Turn 2: follow-up with conversation history
const turn2 = await client.responses.createResponse({
model: 'nimbus-002-preview',
input: [
{
type: 'message',
role: 'user',
content: [{ type: 'input_text', text: 'What topics are discussed in these videos?' }],
},
{
type: 'message',
role: 'assistant',
content: [{ type: 'input_text', text: turn1Text }],
},
{
type: 'message',
role: 'user',
content: [{ type: 'input_text', text: 'Can you go into more detail about the first topic?' }],
},
],
knowledge_base: { collections: ['COLLECTION_ID'] },
});
console.log('Turn 2:', turn2.output?.[0]?.content?.[0]?.text);
The Python SDK uses a simplified message format ({"role": ..., "content": ...}) while the TypeScript SDK uses the full OpenAI Responses format with type: 'message' and structured content arrays.
Instructions
Use the instructions parameter to control response behavior — set a persona, language, format, or domain constraints.
response = client.responses.create(
input="What techniques are discussed?",
collections=["COLLECTION_ID"],
model="nimbus-002-preview",
instructions="Always respond in Spanish. Use bullet points for lists.",
)
const response = await client.responses.createResponse({
model: 'nimbus-002-preview',
input: 'What techniques are discussed?',
knowledge_base: { collections: ['COLLECTION_ID'] },
instructions: 'Always respond in Spanish. Use bullet points for lists.',
});
Background Processing + Polling
For long-running queries or batch workflows, use background: true. The response returns immediately with status in_progress, and you poll for completion.
import time
from cloudglue import CloudGlue
client = CloudGlue()
# Start background response
response = client.responses.create(
input="Provide a comprehensive analysis of all techniques shown.",
collections=["COLLECTION_ID"],
model="nimbus-002-preview",
background=True,
)
print(f"Started: {response.id}, status: {response.status}") # "in_progress"
# Poll until complete
max_attempts = 30
for attempt in range(max_attempts):
result = client.responses.get(response.id)
if result.status == "completed":
print(result.output[0].content[0].text)
break
elif result.status in ("failed", "cancelled"):
print(f"Response ended with status: {result.status}")
break
time.sleep(2)
else:
print(f"Polling timed out after {max_attempts} attempts")
# Or cancel if needed
# cancelled = client.responses.cancel(response.id)
import { CloudGlue } from '@aviaryhq/cloudglue-js';
const client = new CloudGlue();
// Start background response
const response = await client.responses.createResponse({
model: 'nimbus-002-preview',
input: 'Provide a comprehensive analysis of all techniques shown.',
knowledge_base: { collections: ['COLLECTION_ID'] },
background: true,
});
console.log(`Started: ${response.id}, status: ${response.status}`); // "in_progress"
// Poll using the SDK helper
const result = await client.responses.waitForReady(response.id, {
pollingInterval: 2000,
maxAttempts: 30,
});
console.log(result.output?.[0]?.content?.[0]?.text);
// Or cancel if needed
// const cancelled = await client.responses.cancelResponse(response.id);
Rich Citations
Request detailed media description annotations on citations by passing the include parameter. This returns speech transcripts, visual scene descriptions, and scene text for each cited segment.
response = client.responses.create(
input="What techniques are discussed?",
collections=["COLLECTION_ID"],
model="nimbus-002-preview",
include=["cloudglue_citations.media_descriptions"],
)
text = response.output[0].content[0].text
annotations = response.output[0].content[0].annotations
for ann in annotations:
print(f"[{ann.start_time}s - {ann.end_time}s] {ann.file_id}")
if ann.speech:
for s in ann.speech:
print(f" Speaker {s.speaker}: {s.text}")
if ann.visual_scene_description:
for v in ann.visual_scene_description:
print(f" Visual: {v.text}")
const response = await client.responses.createResponse({
model: 'nimbus-002-preview',
input: 'What techniques are discussed?',
knowledge_base: { collections: ['COLLECTION_ID'] },
include: ['cloudglue_citations.media_descriptions'],
});
const annotations = response.output?.[0]?.content?.[0]?.annotations ?? [];
for (const ann of annotations) {
console.log(`[${ann.start_time}s - ${ann.end_time}s] ${ann.file_id}`);
ann.speech?.forEach((s: any) => {
console.log(` Speaker ${s.speaker}: ${s.text}`);
});
ann.visual_scene_description?.forEach((v: any) => {
console.log(` Visual: ${v.text}`);
});
}
Filters
Constrain which videos are searched using metadata, file, or video info filters.
The Python SDK provides a create_filter() helper:from cloudglue import CloudGlue
client = CloudGlue()
# Filter by file metadata
search_filter = client.responses.create_filter(
metadata=[
{"path": "topic", "operator": "Equal", "valueText": "cooking"}
]
)
response = client.responses.create(
input="What techniques are discussed?",
collections=["COLLECTION_ID"],
model="nimbus-001",
filter=search_filter,
)
# Combine multiple filter types
combined_filter = client.responses.create_filter(
metadata=[
{"path": "cuisine", "operator": "Equal", "valueText": "Italian"}
],
video_info=[
{"path": "duration_seconds", "operator": "LessThan", "valueText": "600"}
],
)
In TypeScript, pass filter objects directly:const response = await client.responses.createResponse({
model: 'nimbus-001',
input: 'What techniques are discussed?',
knowledge_base: { collections: ['COLLECTION_ID'] },
filter: {
metadata: [
{ path: 'topic', operator: 'Equal', valueText: 'cooking' },
],
},
});
Supported Filter Operations
| Operator | Description | Value Field |
|---|
Equal / NotEqual | Exact match | valueText |
LessThan / GreaterThan | Numeric comparison | valueText |
In | Value in comma-separated list | valueText |
ContainsAny / ContainsAll | Array operations | valueTextArray |
Filter Categories
metadata — Filter on custom metadata fields (e.g., metadata.topic)
file — Filter on file properties (e.g., id)
video_info — Filter on video properties (e.g., duration_seconds, has_audio)
Best Practices
Model Selection
- Start with nimbus-001 for fast Q&A and summarization
- Use nimbus-002-preview when you need multi-step reasoning, cross-video synthesis, or entity-backed knowledge
- Both models support streaming, background processing, multi-turn, and instructions
Streaming vs Background
- Streaming for interactive UIs where users see text appear in real-time
- Background for batch processing, long-running analysis, or server-to-server workflows
- You cannot use both simultaneously
Entity Collections
- Set a top-level
description on entity_backed_knowledge_config to give the model overall context — e.g., "Customer support calls from enterprise accounts in Q1 2025" helps the model frame its reasoning across all collections
- Give each entity collection descriptive
name and description values — the model uses these to decide when and how to query each collection
- A
name like "recipes" with a description like "Ingredients, cook times, and difficulty levels for each dish" is more helpful than "data" with no description
Multi-Turn Conversations
- Include the full conversation history in each request for proper context
- For long conversations, consider trimming older turns to stay within token limits
- Use
instructions to set consistent behavior across turns rather than repeating guidance in each message
Citations
- Use
include: ["cloudglue_citations.media_descriptions"] when you need to display source segments to users or verify answers
- Citation annotations include timestamps, speech, visual descriptions, and scene text — useful for building “jump to source” features
Try It Out
Ready to build with the Responses API?