Skip to main content
The Responses API is Cloudglue’s next-generation conversational video interface. Compatible with the OpenAI Responses API format, it provides richer annotations, built-in multi-turn conversations, system instructions, streaming, and background processing — all grounded in your video content.
The Responses API works with both Media Description Collections (speech, visual, text) and Entity Collections (structured extracted data). Combine both for the richest video understanding.

When to Use Responses API vs Chat Completions

FeatureChat CompletionsResponses API
Multi-turn conversationsManual message historyBuilt-in message array
StreamingNot availableSSE streaming
Background processingNot availablebackground: true with polling
System instructionsSystem message in arrayDedicated instructions param
Entity-backed knowledgeNot availablenimbus-002-preview with entity collections
Annotations/Citationscitations arrayRich annotations with timestamps
Modelsnimbus-001nimbus-001, nimbus-002-preview
API compatibilityCustom formatOpenAI Responses-compatible
Use Chat Completions if you have an existing integration and only need basic Q&A over media description collections. Use Responses API for new projects, streaming UIs, entity-backed reasoning, or when you need background processing.

Model Selection

nimbus-001

Fast general question answering model. Works with media description collections for speech, visual, and text understanding. Good for straightforward Q&A and summarization.

nimbus-002-preview

Light reasoning model capable of multi-step reasoning and inspecting your video assets from different dimensions. In addition to media description collections, it supports entity-backed knowledge — combining structured entity data with unstructured video descriptions for richer, more precise answers.
nimbus-002-preview is a preview model. Behavior may change as we iterate.
When to use which:
  • nimbus-001 — Fast Q&A, summarization, general questions over video content
  • nimbus-002-preview — Multi-step reasoning, cross-video synthesis, queries that need structured + unstructured data together

Basic Response (Sync)

The simplest usage — send a question and get a complete response.
from cloudglue import CloudGlue

client = CloudGlue()

response = client.responses.create(
    input="What techniques are discussed in these videos?",
    collections=["COLLECTION_ID"],
    model="nimbus-001",
)

print(response.output[0].content[0].text)
The Python SDK uses a top-level collections parameter, while the TypeScript SDK nests it under knowledge_base: { collections: [...] }. Both achieve the same result — the difference is SDK-specific.

Streaming Responses

For real-time UIs, stream the response as it’s generated via Server-Sent Events (SSE). The JavaScript SDK uses web-standard APIs (fetch + ReadableStream) internally, so streaming works in both Node.js 18+ and modern browsers — including Next.js client components, React apps, and other browser environments. The stream emits three event types:
  • response.output_text.delta — Incremental text chunks
  • response.completed — Final event with the full response object (including annotations)
  • error — Error event if something goes wrong
from cloudglue import CloudGlue

client = CloudGlue()

events = client.responses.create(
    input="What are the key topics in these videos?",
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
    stream=True,
)

for event in events:
    evt_type = event.get("event")
    data = event.get("data")

    if evt_type == "response.output_text.delta" and isinstance(data, dict):
        print(data.get("delta", ""), end="", flush=True)
    elif evt_type == "response.completed":
        print()  # final newline
    elif evt_type == "error":
        print(f"Error: {data}")
Streaming and background mode cannot be used together. Setting both stream: true and background: true returns a 400 error.

Entity-Backed Knowledge

This is the key differentiator of nimbus-002-preview. Entity-backed knowledge lets the model reason over both your structured entity data (extracted schemas) and unstructured media descriptions simultaneously. For example, if you have an entity collection with extracted recipe schemas (ingredients, cook times, difficulty) and a media description collection with the full video transcripts, the model can answer questions like “Which beginner recipes take under 30 minutes?” by combining both data sources.

How It Works

  1. Create an entity collection with your extraction schema (see Entity Collections)
  2. Create a media description collection with your videos
  3. Pass both to the Responses API via entity_backed_knowledge configuration
The Python SDK provides helper methods for building entity-backed knowledge configs:
from cloudglue import CloudGlue

client = CloudGlue()

# Build entity collection config using helper method
entity_config = client.responses.create_entity_backed_knowledge_config(
    entity_collections=[
        client.responses.create_entity_collection_config(
            collection_id="ENTITY_COLLECTION_ID",
            name="recipes",
            description="Recipe details including ingredients, cook time, and difficulty",
        )
    ],
    description="Cooking videos with structured recipe data",
)

response = client.responses.create(
    input="Which recipes require the fewest ingredients?",
    collections=["MEDIA_DESCRIPTION_COLLECTION_ID"],
    model="nimbus-002-preview",
    knowledge_base_type="entity_backed_knowledge",
    entity_backed_knowledge_config=entity_config,
)

print(response.output[0].content[0].text)

Configuration Fields

entity_backed_knowledge_config (top-level)

FieldRequiredDescription
entity_collectionsYesArray of entity collection configs (see below)
descriptionNoDescribes the overall knowledge base — gives the model context on what these collections represent and how they should be used together
The top-level description on entity_backed_knowledge_config is important for guiding the model’s reasoning. For example, "Sales call recordings from Q4 2024 with deal outcomes and customer feedback" helps the model understand the domain and tailor its analysis. Without it, the model only sees individual collection descriptions and may miss the bigger picture.

Entity collection config (per-collection)

FieldRequiredDescription
collection_idYesID of the entity collection
nameYesShort identifier for the collection (e.g., "recipes", "speakers")
descriptionYesDescribes what entities are in this collection — helps the model understand when to use it

Entity-Backed Knowledge + Streaming

You can combine entity-backed knowledge with streaming for real-time entity-aware responses:
from cloudglue import CloudGlue

client = CloudGlue()

entity_config = client.responses.create_entity_backed_knowledge_config(
    entity_collections=[
        client.responses.create_entity_collection_config(
            collection_id="ENTITY_COLLECTION_ID",
            name="recipes",
            description="Structured recipe data from cooking videos",
        )
    ],
    description="Cooking videos with recipes and techniques",
)

events = client.responses.create(
    input="What dishes are mentioned and who discussed them?",
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
    stream=True,
    knowledge_base_type="entity_backed_knowledge",
    entity_backed_knowledge_config=entity_config,
)

for event in events:
    evt_type = event.get("event")
    data = event.get("data")
    if evt_type == "response.output_text.delta" and isinstance(data, dict):
        print(data.get("delta", ""), end="", flush=True)
    elif evt_type == "response.completed":
        print()

Multi-Turn Conversations

The input parameter accepts either a string (single question) or a message array (multi-turn conversation). For multi-turn, include the full conversation history.
from cloudglue import CloudGlue

client = CloudGlue()

# Turn 1: initial question
resp1 = client.responses.create(
    input="What topics are discussed in these videos?",
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
)
turn1_text = resp1.output[0].content[0].text
print("Turn 1:", turn1_text)

# Turn 2: follow-up with conversation history
resp2 = client.responses.create(
    input=[
        {"role": "user", "content": "What topics are discussed in these videos?"},
        {"role": "assistant", "content": turn1_text},
        {"role": "user", "content": "Can you go into more detail about the first topic?"},
    ],
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
)
print("Turn 2:", resp2.output[0].content[0].text)
The Python SDK uses a simplified message format ({"role": ..., "content": ...}) while the TypeScript SDK uses the full OpenAI Responses format with type: 'message' and structured content arrays.

Instructions

Use the instructions parameter to control response behavior — set a persona, language, format, or domain constraints.
response = client.responses.create(
    input="What techniques are discussed?",
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
    instructions="Always respond in Spanish. Use bullet points for lists.",
)

Background Processing + Polling

For long-running queries or batch workflows, use background: true. The response returns immediately with status in_progress, and you poll for completion.
import time
from cloudglue import CloudGlue

client = CloudGlue()

# Start background response
response = client.responses.create(
    input="Provide a comprehensive analysis of all techniques shown.",
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
    background=True,
)
print(f"Started: {response.id}, status: {response.status}")  # "in_progress"

# Poll until complete
max_attempts = 30
for attempt in range(max_attempts):
    result = client.responses.get(response.id)
    if result.status == "completed":
        print(result.output[0].content[0].text)
        break
    elif result.status in ("failed", "cancelled"):
        print(f"Response ended with status: {result.status}")
        break
    time.sleep(2)
else:
    print(f"Polling timed out after {max_attempts} attempts")

# Or cancel if needed
# cancelled = client.responses.cancel(response.id)

Rich Citations

Request detailed media description annotations on citations by passing the include parameter. This returns speech transcripts, visual scene descriptions, and scene text for each cited segment.
response = client.responses.create(
    input="What techniques are discussed?",
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
    include=["cloudglue_citations.media_descriptions"],
)

text = response.output[0].content[0].text
annotations = response.output[0].content[0].annotations

for ann in annotations:
    print(f"[{ann.start_time}s - {ann.end_time}s] {ann.file_id}")
    if ann.speech:
        for s in ann.speech:
            print(f"  Speaker {s.speaker}: {s.text}")
    if ann.visual_scene_description:
        for v in ann.visual_scene_description:
            print(f"  Visual: {v.text}")

Filters

Constrain which videos are searched using metadata, file, or video info filters.
The Python SDK provides a create_filter() helper:
from cloudglue import CloudGlue

client = CloudGlue()

# Filter by file metadata
search_filter = client.responses.create_filter(
    metadata=[
        {"path": "topic", "operator": "Equal", "valueText": "cooking"}
    ]
)

response = client.responses.create(
    input="What techniques are discussed?",
    collections=["COLLECTION_ID"],
    model="nimbus-001",
    filter=search_filter,
)

# Combine multiple filter types
combined_filter = client.responses.create_filter(
    metadata=[
        {"path": "cuisine", "operator": "Equal", "valueText": "Italian"}
    ],
    video_info=[
        {"path": "duration_seconds", "operator": "LessThan", "valueText": "600"}
    ],
)

Supported Filter Operations

OperatorDescriptionValue Field
Equal / NotEqualExact matchvalueText
LessThan / GreaterThanNumeric comparisonvalueText
InValue in comma-separated listvalueText
ContainsAny / ContainsAllArray operationsvalueTextArray

Filter Categories

  • metadata — Filter on custom metadata fields (e.g., metadata.topic)
  • file — Filter on file properties (e.g., id)
  • video_info — Filter on video properties (e.g., duration_seconds, has_audio)

Best Practices

Model Selection

  • Start with nimbus-001 for fast Q&A and summarization
  • Use nimbus-002-preview when you need multi-step reasoning, cross-video synthesis, or entity-backed knowledge
  • Both models support streaming, background processing, multi-turn, and instructions

Streaming vs Background

  • Streaming for interactive UIs where users see text appear in real-time
  • Background for batch processing, long-running analysis, or server-to-server workflows
  • You cannot use both simultaneously

Entity Collections

  • Set a top-level description on entity_backed_knowledge_config to give the model overall context — e.g., "Customer support calls from enterprise accounts in Q1 2025" helps the model frame its reasoning across all collections
  • Give each entity collection descriptive name and description values — the model uses these to decide when and how to query each collection
  • A name like "recipes" with a description like "Ingredients, cook times, and difficulty levels for each dish" is more helpful than "data" with no description

Multi-Turn Conversations

  • Include the full conversation history in each request for proper context
  • For long conversations, consider trimming older turns to stay within token limits
  • Use instructions to set consistent behavior across turns rather than repeating guidance in each message

Citations

  • Use include: ["cloudglue_citations.media_descriptions"] when you need to display source segments to users or verify answers
  • Citation annotations include timestamps, speech, visual descriptions, and scene text — useful for building “jump to source” features

Try It Out

Ready to build with the Responses API?