Installation

To install the Cloudglue TypeScript/JavaScript SDK, use npm:

npm install @aviaryhq/cloudglue-js

Usage

  1. Get an API key from cloudglue.dev
  2. Set the API key as an environment variable named CLOUDGLUE_API_KEY or pass the api key in as a parameter to the CloudGlue class constructor

Here’s an example of how to create a CloudGlue client:

import { CloudGlue } from '@aviaryhq/cloudglue-js';

const client = new CloudGlue({
  apiKey: 'cg-your-api-key-here',
});

Working with Files

To use most CloudGlue APIs you’ll need to operate on a file uploaded into CloudGlue.

Below are the basics for working with files:

import * as fs from 'fs';
import * as path from 'path';

// See your current files
const myFiles = await client.files.listFiles({ limit: 10 });
console.log(myFiles);

// Upload a local file
const filePath = '/path/to/your/video.mp4';
const fileName = path.basename(filePath);

const fileBuffer = await fs.promises.readFile(filePath);
const file = new File(
  [fileBuffer],
  fileName,
  { type: 'video/mp4' }, // Adjust MIME type based on your file type
);

const uploadResponse = await client.files.uploadFile({
  file,
  // Optional user defined metadata that can be stored with file
  metadata: {
    topics: ['education', 'software'],
  },
});

// Access the upload response
console.log(`File ID: ${uploadResponse.data.id}`);
console.log(`Status: ${uploadResponse.data.status}`);
console.log(`URI: ${uploadResponse.data.uri}`);

// You can also check status of a previously uploaded file
const fileInfo = await client.files.getFile(uploadResponse.data.id);
console.log(fileInfo);

Note: When uploading files, make sure to:

  1. Handle the file reading properly with error checking
  2. Set the correct MIME type for your video file (e.g., ‘video/mp4’, ‘video/quicktime’, etc.)
  3. Include relevant metadata about the upload
  4. Handle potential upload errors appropriately

Extracting Structured Information from Videos

Organizing information in structured entity schemas allow for response types that are easy to program AI applications against. Let’s get started with extracting structured entity information from videos.

Prompt Only Extraction

Below we’ll show how to extract information from a video using natural language to guide entire process. This is particularly helpful during the exploratory phase where your entity structure may not be completely known yet.

// Specify a prompt for what you are interested in extracting
const prompt =
  'Extract the restaurant name and specialty information from this video';

// Create an extract job
const extractJob = await client.extract.createExtract(myFile.uri, {
  prompt: prompt,
});

// Check the status of the job
const updatedExtractJob = await client.extract.getExtract(extractJob.job_id);
console.log(updatedExtractJob);

Also YouTube URLs are supported as input to this API. Note that YouTube videos are currently limited to speech and metadata level understanding, for fully fledged multimodal video understanding please upload a file instead to the CloudGlue Files API and use that input instead.

// Extract entities from the spoken content of a YouTube video
const result = await client.extract.createExtract(
  'https://www.youtube.com/shorts/BLAH',
  { prompt: 'Extract the main talking points in this video' },
);
console.log(result);

Schema Driven Extraction

In CloudGlue you can direct the extraction process to get data in a specific format, which is helpful if your downstream application requires programming against specific fields or storing data in a database with specific structure.

We allow users to specify schemas either as an example/abbreviated json object or a fully fledged JSON object specification. For convenience we provide a graphical entity schema builder.

In food review videos for example let’s say we really want to know the restaurant name and some review blurb for our table, in which case your entity schema might look something like this:

const schema = {
  restaurants: [
    {
      name: 'string',
      blurb: 'blurb by reviewer on why this place matters',
    },
  ],
};

// Schema driven extraction
const result = await client.extract.createExtract(myFile.uri, {
  schema: schema,
  // Optionally you could also specify a prompt relevant for guiding the extraction further
  // prompt: prompt
});
console.log(result);

Working with Collections

Our abstraction for that is called “Collections”. In a collection not only can you logically store related resources together under a single umbrella, you can also give the platform guidance on the types of information you wnat extracted as entities at rest for later usage.

Below are the basics for working with a collection:

// See your current collections
const myCollections = await client.collections.listCollections({ limit: 10 });
console.log(myCollections);

// Create a new collection with extraction configuration
const myCollection = await client.collections.createCollection({
  name: 'My Must Eats',
  description: 'All my favorite food videos in one collection',
  extract_config: {
    schema: {
      foods: ['string'],
      cuisines: ['string'],
    },
  },
});
console.log(myCollection);

// Add a video to the collection
const fileInfo = await client.collections.addVideo(myCollection.id, myFile.id);
console.log(fileInfo);

// List videos in the collection
const collectionFiles = await client.collections.listVideos(myCollection.id);
console.log(collectionFiles);

// Add a YouTube video to the collection
// Note that YouTube processing is limited to speech and to get the full richness of multimodal video understanding upload a video to Files API instead
const youtubeFileInfo = await client.collections.addYoutubeVideo(
  myCollection.id,
  'https://www.youtube.com/watch?v=BLAH',
);
console.log(youtubeFileInfo);

Once files are processed into a collection their video entities become available for future reference.

// Get the extracted entities
const entities = await client.collections.getEntities(
  myCollection.id,
  myFile.id,
);
console.log(entities);

Talking with Videos

Similar to how chat gpt or claude operate we expose a chat completion API, with a couple extra parameters to allow you to interact with your video collections.

Namely we expose a collections list parameter, to specify which video collection to talk to and also allow some flags like force_search which gives the underlying LLM a hint that we need you to always execute a search for the incoming message as well as include_citations which tells the system to provide references for the information described by the chat generation.

// Define your messages
const query = "What do people like about Franklin's in Austin?";
const messages = [{ role: 'user', content: query }];

// Make a chat completion request
const response = await client.chat.createCompletion({
  model: 'nimbus-001',
  messages: messages,
  collections: [myCollection.id],
  force_search: true,
  include_citations: true,
});

// Access the response
console.log(response.choices?.[0]?.message?.content);

// Access citations if included
const citations = response.choices?.[0]?.citations;
if (citations?.length) {
  console.log('\nCitations:');
  citations.forEach((citation, index) => {
    console.log(`\n[${index + 1}] Video Segment:`);
    console.log(`File ID: ${citation.file_id}`);
    console.log(`Timestamp: ${citation.start_time}s - ${citation.end_time}s`);
    console.log(`Cited Text: "${citation.text}"`);
  });
}