Responses API

The Responses API is Cloudglue’s next-generation conversational video interface. Compatible with the OpenAI Responses API format, it provides richer annotations, built-in multi-turn conversations, system instructions, streaming, and background processing — all grounded in your video content.

The Responses API works with both Media Description Collections (speech, visual, text) and Entity Collections (structured extracted data). Combine both for the richest video understanding.

When to Use Responses API vs Chat Completions

Feature	Chat Completions	Responses API
Multi-turn conversations	Manual message history	Built-in message array
Streaming	Not available	SSE streaming
Background processing	Not available	`background: true` with polling
System instructions	System message in array	Dedicated `instructions` param
Entity-backed knowledge	Not available	`nimbus-002-preview` with entity collections
Annotations/Citations	`citations` array	Rich `annotations` with timestamps
Models	`nimbus-001`	`nimbus-001`, `nimbus-002-preview`
API compatibility	Custom format	OpenAI Responses-compatible

Use Chat Completions if you have an existing integration and only need basic Q&A over media description collections. Use Responses API for new projects, streaming UIs, entity-backed reasoning, or when you need background processing.

Model Selection

nimbus-001

Fast general question answering model. Works with media description collections for speech, visual, and text understanding. Good for straightforward Q&A and summarization.

nimbus-002-preview

Light reasoning model capable of multi-step reasoning and inspecting your video assets from different dimensions. In addition to media description collections, it supports entity-backed knowledge — combining structured entity data with unstructured video descriptions for richer, more precise answers.

nimbus-002-preview is a preview model. Behavior may change as we iterate.

When to use which:

nimbus-001 — Fast Q&A, summarization, general questions over video content
nimbus-002-preview — Multi-step reasoning, cross-video synthesis, queries that need structured + unstructured data together

Basic Response (Sync)

The simplest usage — send a question and get a complete response.

Python
Node

from cloudglue import CloudGlue

client = CloudGlue()

response = client.responses.create(
    input="What techniques are discussed in these videos?",
    collections=["COLLECTION_ID"],
    model="nimbus-001",
)

print(response.output[0].content[0].text)

import { CloudGlue } from '@aviaryhq/cloudglue-js';

const client = new CloudGlue();

const response = await client.responses.createResponse({
  model: 'nimbus-001',
  input: 'What techniques are discussed in these videos?',
  knowledge_base: { collections: ['COLLECTION_ID'] },
});

console.log(response.output?.[0]?.content?.[0]?.text);

The Python SDK uses a top-level collections parameter, while the TypeScript SDK nests it under knowledge_base: { collections: [...] }. Both achieve the same result — the difference is SDK-specific.

Streaming Responses

For real-time UIs, stream the response as it’s generated via Server-Sent Events (SSE). The JavaScript SDK uses web-standard APIs (fetch + ReadableStream) internally, so streaming works in both Node.js 18+ and modern browsers — including Next.js client components, React apps, and other browser environments. The stream emits three event types:

response.output_text.delta — Incremental text chunks
response.completed — Final event with the full response object (including annotations)
error — Error event if something goes wrong

Python
Node

from cloudglue import CloudGlue

client = CloudGlue()

events = client.responses.create(
    input="What are the key topics in these videos?",
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
    stream=True,
)

for event in events:
    evt_type = event.get("event")
    data = event.get("data")

    if evt_type == "response.output_text.delta" and isinstance(data, dict):
        print(data.get("delta", ""), end="", flush=True)
    elif evt_type == "response.completed":
        print()  # final newline
    elif evt_type == "error":
        print(f"Error: {data}")

import { CloudGlue } from '@aviaryhq/cloudglue-js';

const client = new CloudGlue();

const stream = await client.responses.createStreamingResponse({
  model: 'nimbus-002-preview',
  input: 'What are the key topics in these videos?',
  knowledge_base: { collections: ['COLLECTION_ID'] },
});

for await (const event of stream) {
  if (event.type === 'response.output_text.delta') {
    process.stdout.write(event.delta);
  } else if (event.type === 'response.completed') {
    console.log('\n');
    // Access annotations from the completed response
    const annotations = event.response?.output?.[0]?.content?.[0]?.annotations;
    if (annotations?.length) {
      console.log(`[${annotations.length} citation(s)]`);
    }
  } else if (event.type === 'error') {
    console.error(`Error: ${event.error.message}`);
  }
}

Streaming and background mode cannot be used together. Setting both stream: true and background: true returns a 400 error.

Entity-Backed Knowledge

This is the key differentiator of nimbus-002-preview. Entity-backed knowledge lets the model reason over both your structured entity data (extracted schemas) and unstructured media descriptions simultaneously. For example, if you have an entity collection with extracted recipe schemas (ingredients, cook times, difficulty) and a media description collection with the full video transcripts, the model can answer questions like “Which beginner recipes take under 30 minutes?” by combining both data sources.

How It Works

Create an entity collection with your extraction schema (see Entity Collections)
Create a media description collection with your videos
Pass both to the Responses API via entity_backed_knowledge configuration

Python
Node

The Python SDK provides helper methods for building entity-backed knowledge configs:

from cloudglue import CloudGlue

client = CloudGlue()

# Build entity collection config using helper method
entity_config = client.responses.create_entity_backed_knowledge_config(
    entity_collections=[
        client.responses.create_entity_collection_config(
            collection_id="ENTITY_COLLECTION_ID",
            name="recipes",
            description="Recipe details including ingredients, cook time, and difficulty",
        )
    ],
    description="Cooking videos with structured recipe data",
)

response = client.responses.create(
    input="Which recipes require the fewest ingredients?",
    collections=["MEDIA_DESCRIPTION_COLLECTION_ID"],
    model="nimbus-002-preview",
    knowledge_base_type="entity_backed_knowledge",
    entity_backed_knowledge_config=entity_config,
)

print(response.output[0].content[0].text)

In TypeScript, pass entity configuration as a raw object in knowledge_base:

import { CloudGlue } from '@aviaryhq/cloudglue-js';

const client = new CloudGlue();

const response = await client.responses.createResponse({
  model: 'nimbus-002-preview',
  input: 'Which recipes require the fewest ingredients?',
  knowledge_base: {
    type: 'entity_backed_knowledge',
    collections: ['MEDIA_DESCRIPTION_COLLECTION_ID'],
    entity_backed_knowledge_config: {
      description: 'Cooking videos with structured recipe data',
      entity_collections: [
        {
          name: 'recipes',
          description: 'Recipe details including ingredients, cook time, and difficulty',
          collection_id: 'ENTITY_COLLECTION_ID',
        },
      ],
    },
  },
});

console.log(response.output?.[0]?.content?.[0]?.text);

Configuration Fields

`entity_backed_knowledge_config` (top-level)

Field	Required	Description
`entity_collections`	Yes	Array of entity collection configs (see below)
`description`	No	Describes the overall knowledge base — gives the model context on what these collections represent and how they should be used together

The top-level description on entity_backed_knowledge_config is important for guiding the model’s reasoning. For example, "Sales call recordings from Q4 2024 with deal outcomes and customer feedback" helps the model understand the domain and tailor its analysis. Without it, the model only sees individual collection descriptions and may miss the bigger picture.

Entity collection config (per-collection)

Field	Required	Description
`collection_id`	Yes	ID of the entity collection
`name`	Yes	Short identifier for the collection (e.g., `"recipes"`, `"speakers"`)
`description`	Yes	Describes what entities are in this collection — helps the model understand when to use it

Entity-Backed Knowledge + Streaming

You can combine entity-backed knowledge with streaming for real-time entity-aware responses:

Python
Node

from cloudglue import CloudGlue

client = CloudGlue()

entity_config = client.responses.create_entity_backed_knowledge_config(
    entity_collections=[
        client.responses.create_entity_collection_config(
            collection_id="ENTITY_COLLECTION_ID",
            name="recipes",
            description="Structured recipe data from cooking videos",
        )
    ],
    description="Cooking videos with recipes and techniques",
)

events = client.responses.create(
    input="What dishes are mentioned and who discussed them?",
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
    stream=True,
    knowledge_base_type="entity_backed_knowledge",
    entity_backed_knowledge_config=entity_config,
)

for event in events:
    evt_type = event.get("event")
    data = event.get("data")
    if evt_type == "response.output_text.delta" and isinstance(data, dict):
        print(data.get("delta", ""), end="", flush=True)
    elif evt_type == "response.completed":
        print()

import { CloudGlue } from '@aviaryhq/cloudglue-js';

const client = new CloudGlue();

const stream = await client.responses.createStreamingResponse({
  model: 'nimbus-002-preview',
  input: 'What dishes are mentioned and who discussed them?',
  knowledge_base: {
    type: 'entity_backed_knowledge',
    collections: ['COLLECTION_ID'],
    entity_backed_knowledge_config: {
      description: 'Cooking videos with recipes and techniques',
      entity_collections: [
        {
          name: 'recipes',
          description: 'Structured recipe data from cooking videos',
          collection_id: 'ENTITY_COLLECTION_ID',
        },
      ],
    },
  },
});

for await (const event of stream) {
  if (event.type === 'response.output_text.delta') {
    process.stdout.write(event.delta);
  } else if (event.type === 'response.completed') {
    console.log('\n');
  }
}

Multi-Turn Conversations

The input parameter accepts either a string (single question) or a message array (multi-turn conversation). For multi-turn, include the full conversation history.

Python
Node

from cloudglue import CloudGlue

client = CloudGlue()

# Turn 1: initial question
resp1 = client.responses.create(
    input="What topics are discussed in these videos?",
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
)
turn1_text = resp1.output[0].content[0].text
print("Turn 1:", turn1_text)

# Turn 2: follow-up with conversation history
resp2 = client.responses.create(
    input=[
        {"role": "user", "content": "What topics are discussed in these videos?"},
        {"role": "assistant", "content": turn1_text},
        {"role": "user", "content": "Can you go into more detail about the first topic?"},
    ],
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
)
print("Turn 2:", resp2.output[0].content[0].text)

import { CloudGlue } from '@aviaryhq/cloudglue-js';

const client = new CloudGlue();

// Turn 1: initial question
const turn1 = await client.responses.createResponse({
  model: 'nimbus-002-preview',
  input: 'What topics are discussed in these videos?',
  knowledge_base: { collections: ['COLLECTION_ID'] },
});
const turn1Text = turn1.output?.[0]?.content?.[0]?.text ?? '';
console.log('Turn 1:', turn1Text);

// Turn 2: follow-up with conversation history
const turn2 = await client.responses.createResponse({
  model: 'nimbus-002-preview',
  input: [
    {
      type: 'message',
      role: 'user',
      content: [{ type: 'input_text', text: 'What topics are discussed in these videos?' }],
    },
    {
      type: 'message',
      role: 'assistant',
      content: [{ type: 'input_text', text: turn1Text }],
    },
    {
      type: 'message',
      role: 'user',
      content: [{ type: 'input_text', text: 'Can you go into more detail about the first topic?' }],
    },
  ],
  knowledge_base: { collections: ['COLLECTION_ID'] },
});
console.log('Turn 2:', turn2.output?.[0]?.content?.[0]?.text);

The Python SDK uses a simplified message format ({"role": ..., "content": ...}) while the TypeScript SDK uses the full OpenAI Responses format with type: 'message' and structured content arrays.

Instructions

Use the instructions parameter to control response behavior — set a persona, language, format, or domain constraints.

Python
Node

response = client.responses.create(
    input="What techniques are discussed?",
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
    instructions="Always respond in Spanish. Use bullet points for lists.",
)

const response = await client.responses.createResponse({
  model: 'nimbus-002-preview',
  input: 'What techniques are discussed?',
  knowledge_base: { collections: ['COLLECTION_ID'] },
  instructions: 'Always respond in Spanish. Use bullet points for lists.',
});

Background Processing + Polling

For long-running queries or batch workflows, use background: true. The response returns immediately with status in_progress, and you poll for completion.

Python
Node

import time
from cloudglue import CloudGlue

client = CloudGlue()

# Start background response
response = client.responses.create(
    input="Provide a comprehensive analysis of all techniques shown.",
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
    background=True,
)
print(f"Started: {response.id}, status: {response.status}")  # "in_progress"

# Poll until complete
max_attempts = 30
for attempt in range(max_attempts):
    result = client.responses.get(response.id)
    if result.status == "completed":
        print(result.output[0].content[0].text)
        break
    elif result.status in ("failed", "cancelled"):
        print(f"Response ended with status: {result.status}")
        break
    time.sleep(2)
else:
    print(f"Polling timed out after {max_attempts} attempts")

# Or cancel if needed
# cancelled = client.responses.cancel(response.id)

import { CloudGlue } from '@aviaryhq/cloudglue-js';

const client = new CloudGlue();

// Start background response
const response = await client.responses.createResponse({
  model: 'nimbus-002-preview',
  input: 'Provide a comprehensive analysis of all techniques shown.',
  knowledge_base: { collections: ['COLLECTION_ID'] },
  background: true,
});
console.log(`Started: ${response.id}, status: ${response.status}`); // "in_progress"

// Poll using the SDK helper
const result = await client.responses.waitForReady(response.id, {
  pollingInterval: 2000,
  maxAttempts: 30,
});
console.log(result.output?.[0]?.content?.[0]?.text);

// Or cancel if needed
// const cancelled = await client.responses.cancelResponse(response.id);

Rich Citations

Request detailed media description annotations on citations by passing the include parameter. This returns speech transcripts, visual scene descriptions, and scene text for each cited segment.

Python
Node

response = client.responses.create(
    input="What techniques are discussed?",
    collections=["COLLECTION_ID"],
    model="nimbus-002-preview",
    include=["cloudglue_citations.media_descriptions"],
)

text = response.output[0].content[0].text
annotations = response.output[0].content[0].annotations

for ann in annotations:
    print(f"[{ann.start_time}s - {ann.end_time}s] {ann.file_id}")
    if ann.speech:
        for s in ann.speech:
            print(f"  Speaker {s.speaker}: {s.text}")
    if ann.visual_scene_description:
        for v in ann.visual_scene_description:
            print(f"  Visual: {v.text}")

const response = await client.responses.createResponse({
  model: 'nimbus-002-preview',
  input: 'What techniques are discussed?',
  knowledge_base: { collections: ['COLLECTION_ID'] },
  include: ['cloudglue_citations.media_descriptions'],
});

const annotations = response.output?.[0]?.content?.[0]?.annotations ?? [];

for (const ann of annotations) {
  console.log(`[${ann.start_time}s - ${ann.end_time}s] ${ann.file_id}`);
  ann.speech?.forEach((s: any) => {
    console.log(`  Speaker ${s.speaker}: ${s.text}`);
  });
  ann.visual_scene_description?.forEach((v: any) => {
    console.log(`  Visual: ${v.text}`);
  });
}

Filters

Constrain which videos are searched using metadata, file, or video info filters.

Python
Node

The Python SDK provides a create_filter() helper:

from cloudglue import CloudGlue

client = CloudGlue()

# Filter by file metadata
search_filter = client.responses.create_filter(
    metadata=[
        {"path": "topic", "operator": "Equal", "valueText": "cooking"}
    ]
)

response = client.responses.create(
    input="What techniques are discussed?",
    collections=["COLLECTION_ID"],
    model="nimbus-001",
    filter=search_filter,
)

# Combine multiple filter types
combined_filter = client.responses.create_filter(
    metadata=[
        {"path": "cuisine", "operator": "Equal", "valueText": "Italian"}
    ],
    video_info=[
        {"path": "duration_seconds", "operator": "LessThan", "valueText": "600"}
    ],
)

In TypeScript, pass filter objects directly:

const response = await client.responses.createResponse({
  model: 'nimbus-001',
  input: 'What techniques are discussed?',
  knowledge_base: { collections: ['COLLECTION_ID'] },
  filter: {
    metadata: [
      { path: 'topic', operator: 'Equal', valueText: 'cooking' },
    ],
  },
});

Supported Filter Operations

Operator	Description	Value Field
`Equal` / `NotEqual`	Exact match	`valueText`
`LessThan` / `GreaterThan`	Numeric comparison	`valueText`
`In`	Value in comma-separated list	`valueText`
`ContainsAny` / `ContainsAll`	Array operations	`valueTextArray`

Filter Categories

metadata — Filter on custom metadata fields (e.g., metadata.topic)
file — Filter on file properties (e.g., id)
video_info — Filter on video properties (e.g., duration_seconds, has_audio)

Best Practices

Model Selection

Start with nimbus-001 for fast Q&A and summarization
Use nimbus-002-preview when you need multi-step reasoning, cross-video synthesis, or entity-backed knowledge
Both models support streaming, background processing, multi-turn, and instructions

Streaming vs Background

Streaming for interactive UIs where users see text appear in real-time
Background for batch processing, long-running analysis, or server-to-server workflows
You cannot use both simultaneously

Entity Collections

Set a top-level description on entity_backed_knowledge_config to give the model overall context — e.g., "Customer support calls from enterprise accounts in Q1 2025" helps the model frame its reasoning across all collections
Give each entity collection descriptive name and description values — the model uses these to decide when and how to query each collection
A name like "recipes" with a description like "Ingredients, cook times, and difficulty levels for each dish" is more helpful than "data" with no description

Multi-Turn Conversations

Include the full conversation history in each request for proper context
For long conversations, consider trimming older turns to stay within token limits
Use instructions to set consistent behavior across turns rather than repeating guidance in each message

Citations

Use include: ["cloudglue_citations.media_descriptions"] when you need to display source segments to users or verify answers
Citation annotations include timestamps, speech, visual descriptions, and scene text — useful for building “jump to source” features

Try It Out

Ready to build with the Responses API?

Quickstart: Responses API Quickstart
API Reference: Create Response
SDKs: JavaScript | Python
Playground: Try in browser

Getting Started

Core Concepts

Deep Dives

Data Connectors

When to Use Responses API vs Chat Completions

Model Selection

nimbus-001

nimbus-002-preview

Basic Response (Sync)

Streaming Responses

Entity-Backed Knowledge

How It Works

Configuration Fields

`entity_backed_knowledge_config` (top-level)

Entity collection config (per-collection)

Entity-Backed Knowledge + Streaming

Multi-Turn Conversations

Instructions

Background Processing + Polling

Rich Citations

Filters

Supported Filter Operations

Filter Categories

Best Practices

Model Selection

Streaming vs Background

Entity Collections

Multi-Turn Conversations

Citations

Try It Out

Getting Started

Core Concepts

Deep Dives

Data Connectors

​When to Use Responses API vs Chat Completions

​Model Selection

​nimbus-001

​nimbus-002-preview

​Basic Response (Sync)

​Streaming Responses

​Entity-Backed Knowledge

​How It Works

​Configuration Fields

​entity_backed_knowledge_config (top-level)

​Entity collection config (per-collection)

​Entity-Backed Knowledge + Streaming

​Multi-Turn Conversations

​Instructions

​Background Processing + Polling

​Rich Citations

​Filters

​Supported Filter Operations

​Filter Categories

​Best Practices

​Model Selection

​Streaming vs Background

​Entity Collections

​Multi-Turn Conversations

​Citations

​Try It Out

When to Use Responses API vs Chat Completions

Model Selection

nimbus-001

nimbus-002-preview

Basic Response (Sync)

Streaming Responses

Entity-Backed Knowledge

How It Works

Configuration Fields

`entity_backed_knowledge_config` (top-level)

Entity collection config (per-collection)

Entity-Backed Knowledge + Streaming

Multi-Turn Conversations

Instructions

Background Processing + Polling

Rich Citations

Filters

Supported Filter Operations

Filter Categories

Best Practices

Model Selection

Streaming vs Background

Entity Collections

Multi-Turn Conversations

Citations

Try It Out