JavaScript

The wrapper is written in TypeScript and has first class support for it.

You can find the package on npm.

Installation

To install the Cloudglue JavaScript SDK:

npm
pnpm
yarn

bash npm install @aviaryhq/cloudglue-js

Usage

Get an API key from cloudglue.dev
Set the API key as an environment variable named CLOUDGLUE_API_KEY or pass the api key in as a parameter to the Cloudglue class constructor

Here’s an example of how to create a Cloudglue client:

import { CloudGlue } from '@aviaryhq/cloudglue-js';

const client = new CloudGlue({
  apiKey: 'cg-your-api-key-here',
});

Working with Files

To use most Cloudglue APIs you’ll need to operate on a file uploaded into Cloudglue. Below are the basics for working with files:

import * as fs from 'fs';
import * as path from 'path';

// See your current files
const myFiles = await client.files.listFiles({ limit: 10 });
console.log(myFiles);

// Upload a local file
const filePath = '/path/to/your/video.mp4';
const fileName = path.basename(filePath);

const fileBuffer = await fs.promises.readFile(filePath);
const file = new File(
  [fileBuffer],
  fileName,
  { type: 'video/mp4' }, // Adjust MIME type based on your file type
);

const uploadResponse = await client.files.uploadFile({
  file,
  // Optional user defined metadata that can be stored with file
  metadata: {
    topics: ['education', 'software'],
  },
});

// Access the upload response
console.log(`File ID: ${uploadResponse.data.id}`);
console.log(`Status: ${uploadResponse.data.status}`);
console.log(`URI: ${uploadResponse.data.uri}`);

// You can also check status of a previously uploaded file
const fileInfo = await client.files.getFile(uploadResponse.data.id);
console.log(fileInfo);

Note: When uploading files, make sure to:

Handle the file reading properly with error checking
Set the correct MIME type for your video file (e.g., ‘video/mp4’, ‘video/quicktime’, etc.)
Include relevant metadata about the upload
Handle potential upload errors appropriately

Generating Media Descriptions

You can use our Describe API to generate multimodal media descriptions from videos. Getting started is easy, just specify the video you want to describe and the configuration of the description.

const description = await client.describe.createDescribe(fileUri, {
  enable_summary: true,
  enable_speech: true,
  enable_scene_text: true,
  enable_visual_scene_description: true,
});

const updatedDescription = await client.describe.getDescribe(
  description.job_id,
);
console.log(updatedDescription);

YouTube URLs are supported as input to this API.

Note that YouTube videos are currently limited to speech and metadata level understanding, for fully fledged multimodal video understanding please upload a file instead to the Cloudglue Files API and use that input instead.

const description = await client.describe.createDescribe(youtubeUrl, {
  enable_summary: true,
  enable_speech: true,
  enable_scene_text: false,
  enable_visual_scene_description: false,
});

const updatedDescription = await client.describe.getDescribe(
  description.job_id,
);
console.log(updatedDescription);

Extracting Structured Information from Videos

Organizing information in structured entity schemas allow for response types that are easy to program AI applications against. Let’s get started with extracting structured entity information from videos.

Prompt Only Extraction

Below we’ll show how to extract information from a video using natural language to guide entire process. This is particularly helpful during the exploratory phase where your entity structure may not be completely known yet.

// Specify a prompt for what you are interested in extracting
const prompt =
  'Extract the restaurant name and specialty information from this video';

// Create an extract job
const extractJob = await client.extract.createExtract(myFile.uri, {
  prompt: prompt,
});

// Check the status of the job
const updatedExtractJob = await client.extract.getExtract(extractJob.job_id);
console.log(updatedExtractJob);

YouTube URLs are supported as input to this API.

// Extract entities from the spoken content of a YouTube video
const result = await client.extract.createExtract(
  'https://www.youtube.com/shorts/BLAH',
  {
    prompt: 'Extract the main talking points in this video',
  },
);
console.log(result);

Schema Driven Extraction

In Cloudglue you can direct the extraction process to get data in a specific format, which is helpful if your downstream application requires programming against specific fields or storing data in a database with specific structure. We allow users to specify schemas either as an example/abbreviated json object or a fully fledged JSON object specification. For convenience we provide a graphical entity schema builder. In food review videos for example let’s say we really want to know the restaurant name and some review blurb for our table, in which case your entity schema might look something like this:

const schema = {
  restaurants: [
    {
      name: 'string',
      blurb: 'blurb by reviewer on why this place matters',
    },
  ],
};

// Schema driven extraction
const result = await client.extract.createExtract(myFile.uri, {
  schema: schema,
  // Optionally you could also specify a prompt relevant for guiding the extraction further
  // prompt: prompt
});
console.log(result);

Working with Collections

Our abstraction for that is called “Collections”. In a collection not only can you logically store related resources together under a single umbrella, you can also give the platform guidance on the types of information you wnat extracted as entities at rest for later usage. Below are the basics for working with a collection:

// See your current collections
const myCollections = await client.collections.listCollections({ limit: 10 });
console.log(myCollections);

// Create a new collection with extraction configuration
const myCollection = await client.collections.createCollection({
  name: 'My Must Eats',
  collection_type: 'entities',
  description: 'All my favorite food videos in one collection',
  extract_config: {
    schema: {
      foods: ['string'],
      cuisines: ['string'],
    },
  },
});
console.log(myCollection);

// Add a video to the collection
const fileInfo = await client.collections.addVideo(myCollection.id, myFile.id);
console.log(fileInfo);

// List videos in the collection
const collectionFiles = await client.collections.listVideos(myCollection.id);
console.log(collectionFiles);

// Add a YouTube video to the collection
// Note that YouTube processing is limited to speech and to get the full richness of multimodal video understanding upload a video to Files API instead
const youtubeFileInfo = await client.collections.addVideoByUrl({
  collectionId: myCollection.id,
  url: 'https://www.youtube.com/watch?v=BLAH',
  params: {},
});
console.log(youtubeFileInfo);

Once files are processed into a collection their video entities become available for future reference.

// Get the extracted entities
const entities = await client.collections.getEntities(
  myCollection.id,
  myFile.id,
);
console.log(entities);

Working with Media Description Collections

For media description collections, you can retrieve the generated descriptions:

// Get media descriptions for a specific video in a collection
const descriptions = await client.collections.getMediaDescriptions(
  mediaDescriptionCollection.id,
  myFile.id,
  'json', // or 'markdown'
);
console.log(descriptions);

// List all media descriptions in a collection
const allDescriptions = await client.collections.listMediaDescriptions(
  mediaDescriptionCollection.id,
  {
    limit: 10,
    offset: 0,
    response_format: 'json',
  },
);
console.log(allDescriptions);

Talking with Videos

Similar to how chat gpt or claude operate we expose a chat completion API, with a couple extra parameters to allow you to interact with your video collections. Namely we expose a collections list parameter, to specify which video collection to talk to.

const mediaDescriptionCollection = await client.collections.createCollection({
  name: 'Media Descriptions',
  collection_type: 'media-descriptions',
  describe_config: {
    enable_speech: true,
    enable_visual_scene_description: true,
    enable_scene_text: true,
    enable_summary: true,
  },
});

const fileInfo = await client.collections.addVideo(
  mediaDescriptionCollection.id,
  myFile.id,
);

// Define your messages
const query = "What do people like about Franklin's in Austin?";
const messages = [{ role: 'user', content: query }];

// Make a chat completion request
const response = await client.chat.createCompletion({
  // nimbus-001 only supports media description collections
  model: 'nimbus-001',
  messages: messages,
  collections: [mediaDescriptionCollection.id],
});

// Access the response
console.log(response.choices?.[0]?.message?.content);

// Access citations if included
const citations = response.choices?.[0]?.citations;
if (citations?.length) {
  console.log('\nCitations:');
  citations.forEach((citation, index) => {
    console.log(`\n[${index + 1}] Video Segment:`);
    console.log(`File ID: ${citation.file_id}`);
    console.log(`Timestamp: ${citation.start_time}s - ${citation.end_time}s`);
    console.log(`Cited Text: "${citation.text}"`);
  });
}

Overall

SDKs

Installation

Usage

Working with Files

Generating Media Descriptions

Extracting Structured Information from Videos

Prompt Only Extraction

Schema Driven Extraction

Working with Collections

Working with Media Description Collections

Talking with Videos

Overall

SDKs

​Installation

​Usage

​Working with Files

​Generating Media Descriptions

​Extracting Structured Information from Videos

​Prompt Only Extraction

​Schema Driven Extraction

​Working with Collections

​Working with Media Description Collections

​Talking with Videos

Installation

Usage

Working with Files

Generating Media Descriptions

Extracting Structured Information from Videos

Prompt Only Extraction

Schema Driven Extraction

Working with Collections

Working with Media Description Collections

Talking with Videos