Video is a treasure trove of information, but developers need a way to get that information out. In some cases, developers may have used speech-to-text APIs to get the speech from a video. However, this doesn’t give you the full picture.

With Cloudglue, you can transcribe a video and get the speech alongside other information like on-screen text and visual scene descriptions.

This allows you to build more powerful applications that can operate on more information from the video, and in a simple and straightforward manner.

Cloudglue allows you to either transcribe locally uploaded files, or videos from YouTube.

The Basics

Transcribe Config

The transcribe config is a JSON object that contains the configuration for the transcription. It dictates what the transcription will include from the video/audio file.

By default, the transcription will include the speech, and summary of the video/audio file. However, you can also include visual scene descriptions, and on screen text or captions from the video/audio file.

Example Config

Here’s an example of the config you can use to transcribe a video/audio file.

{
  // Defaults to true
  "enable_speech": true,
  // Defaults to false
  "enable_scene_text": false,
  // Defaults to false
  "enable_visual_scene_description": false,
  // Defaults to true
  "enable_summary": true
}

By altering the config, you can explicitly control what gets generated from the video/audio file.

Generating a Transcription

Example

All you need is a few lines of code to generate a transcription with Cloudglue.

// File uri comes from our files API, in the form of cloudglue://file_id
const transcription = await cloudglue.transcribe.createTranscribe(fileUri, {
  enable_speech: true,
  enable_scene_text: true,
  enable_visual_scene_description: true,
  enable_summary: true,
});

// ...

// Or if you want to use a youtube video
const transcription = await cloudglue.transcribe.createTranscribe('https://www.youtube.com/watch?v=AJpK3YTTKZ4', {
enable_speech: true,
enable_scene_text: true,
enable_visual_scene_description: true,
enable_summary: true,
});

Transcription Outputs

Getting the transcription is as simple as making one call to get the transcription.

Getting the Transcription

const transcription = await cloudglue.transcribe.getTranscribe(transcriptionId);
console.log(transcription);

Examples

Here’s an example of a truncated output you get when you transcribe a video/audio file.

JSON Example

JSON Output
{
  "data": {
    "title": "Maryland Public Service Commission Hearing on Chaberton Energy's Wild Turkey Community Solar Project",
    "summary": "This video documents a Maryland Public Service Commission hearing held on May 22, 2024, regarding the application for a certificate of public convenience and necessity for the Wild Turkey community solar project by Chaberton Energy. The hearing, presided over by Chief Public Utility Law Judge Chuck McLean, includes introductions of involved attorneys and state agency representatives, followed by a detailed presentation from Chaberton Energy representatives about the project's benefits, community engagement efforts, environmental considerations, and technical design. The project aims to provide renewable energy benefits, local tax revenue, and subscriber savings while ensuring environmental protection and compliance with state regulations. The session concludes with procedural updates and plans for future public hearings.",
    "speech": [
      {
        "text": "Alright.",
        "start_time": 2.8799999,
        "end_time": 3.6
      },
      {
        "text": "Good morning, everybody.",
        "start_time": 3.9199998,
        "end_time": 5.04
      },
      {
        "text": "Everybody hear me okay?",
        "start_time": 5.04,
        "end_time": 6.3999996
      },
      {
        "text": "Yes,",
        "start_time": 6.3999996,
        "end_time": 6.8
      },
      {
        "text": "your honor.",
        "start_time": 8.4,
        "end_time": 8.96
      },
      {
        "text": "Excellent.",
        "start_time": 8.96,
        "end_time": 9.679999
      },
      {
        "text": "Alright.",
        "start_time": 9.679999,
        "end_time": 10.16
      },
      {
        "text": "Doug, we're",
        "start_time": 10.16,
        "end_time": 10.719999
      },
      {
        "text": "ready to go, sir?",
        "start_time": 10.719999,
        "end_time": 11.36
      },
      {
        "text": "Yes.",
        "start_time": 12.095,
        "end_time": 12.575001
      },
      {
        "text": "Can you hear me okay, your honor?",
        "start_time": 12.575001,
        "end_time": 14.175
      },
      {
        "text": "Yes, sir.",
        "start_time": 14.175,
        "end_time": 14.895
      },
      {
        "text": "Let's go on the record, sir.",
        "start_time": 14.895,
        "end_time": 16.335
      },
      {
        "text": "We are on the record.",
        "start_time": 17.535,
        "end_time": 18.815
      },
      {
        "text": "Thank you very much.",
        "start_time": 18.815,
        "end_time": 19.695
      },
      {
        "text": "Good evening, everyone.",
        "start_time": 19.695,
        "end_time": 20.575
      },
      {
        "text": "Good evening, everyone.",
        "start_time": 19.695,
        "end_time": 20.575
      },
      {
        "text": "Today is 05/22/2024, and it's approximately 7PM.",
        "start_time": 20.575,
        "end_time": 25.055
      },
      {
        "text": "My name is Chuck McLean.",
        "start_time": 25.055,
        "end_time": 26.095001
      },
      {
        "text": "I'm the chief public utility law judge at the Maryland Public Service Commission, and I am filling in for judge Christine Burke, uh, for the, uh, just this hearing, so you're not gonna be stuck with me at least for the duration of this particular case.",
        "start_time": 26.095001,
        "end_time": 38.25
      },
      {
        "text": "Now no one, uh, signed up to speak this evening.",
        "start_time": 69.365005,
        "end_time": 71.925
      },
      {
        "text": "You may also file written comments by sending them directly to the commission.",
        "start_time": 95.759995,
        "end_time": 100.245
      },
      {
        "text": "Uh, attention, Jamie Bergen, chief clerk, six Saint Paul Street, sixteenth floor, Baltimore, Maryland two one two zero two.",
        "start_time": 100.565,
        "end_time": 109.365
      },
      {
        "text": "Uh, and please reference case number 9717 so the case, uh, so the comments make it to the correct file.",
        "start_time": 109.685,
        "end_time": 115.6
      },
      ...
    ],
    "visual_scene_description": [
      {
        "text": "The video begins with a black screen displaying the name \"Doug\".\nThe scene transitions to a man identified as Law Judge. He is wearing a suit jacket. Behind him are framed pictures on a white wall.\nThe scene changes to a man identified as Andy Flavin. He is wearing a suit and tie and glasses. Behind him is a wooden cabinet, a picture, and a telephone.\nThe scene returns to the man identified as Law Judge. He looks down.\nThe scene transitions back to the black screen with the name \"Doug\".\nThe scene returns to the man identified as Law Judge.",
        "start_time": 0,
        "end_time": 20
      },
      {
        "text": "A man with short graying hair, wearing a light brown suit jacket and a blue shirt, speaks directly to the camera. He is centered in the frame, with a white wall behind him. On the wall are two framed pictures and a display of small items. The lighting is neutral, and the camera remains static throughout the scene.",
        "start_time": 20,
        "end_time": 40
      },
      {
        "text": "A man in a suit and blue shirt speaks directly to the camera. He is in front of a white wall with framed pictures. The camera angle is a medium shot, focusing on the man's face and upper body.",
        "start_time": 40,
        "end_time": 60
      },
      {
        "text": "A man with short gray hair and a tan suit speaks directly to the camera. He is indoors with a white wall behind him with two framed pictures.",
        "start_time": 60,
        "end_time": 80
      },
      {
        "text": "The video shows a man with short, graying hair wearing a tan suit jacket. He is speaking directly to the camera. Behind him is a white wall with two framed pictures.",
        "start_time": 80,
        "end_time": 100
      },
      {
        "text": "A man with short graying hair wearing a suit jacket and blue shirt is speaking directly to the camera. Behind him is a white wall with two framed pictures. The lighting is bright and the camera angle is a medium close-up.",
        "start_time": 100,
        "end_time": 120
      },
      {
        "text": "A man with short, graying hair and a light complexion is speaking directly to the camera. He is wearing a tan suit jacket and a light blue shirt. The background is a plain white wall with three framed pictures hanging on it. The lighting is neutral, and the camera angle is a medium close-up, focusing on his face and upper body.",
        "start_time": 120,
        "end_time": 140
      },
      ...
    ],
    "scene_text": [
      {
        "text": "Doug\nLaw Judge\nAndy Flavin\nLaw Judge\nDoug\nLaw Judge",
        "start_time": 0,
        "end_time": 20
      },
      {
        "text": "Law Judge",
        "start_time": 20,
        "end_time": 39
      },
      {
        "text": "@Chaberton\nAmee Beame - Chab...\nchaberton\nENERGY\nPROJECT\nWhat is\nCommunity\nSolar?\nBENEFITS\nThank You\n7:04 PM\n5/22/2024",
        "start_time": 273,
        "end_time": 279
      },
      {
        "text": "Law Judge",
        "start_time": 274,
        "end_time": 276
      },
      {
        "text": "chaberton\nENERGY\nWhat is\nCommunity\nSolar?\nBENEFITS",
        "start_time": 280,
        "end_time": 282
      },
      {
        "text": "PROPOSED DIVISION\nAREA (APPROX. 25.45\nACRES) (10.30 HECTARES)",
        "start_time": 420,
        "end_time": 421
      },
      {
        "text": "PROPOSED DIVISION\nAREA (APPROX. 25.45\nACRES) (10.30 HECTARES)\nCLEAR EXISTING\nOR IRRIGATION\nLINE (TYP.)\nPROPOSED SECURITY\nFENCE (TYP.)\nMODULAR GROUND\nMOUNTED SOLAR ARRAY\n(APPROX. 18.15 ACRES)\n(7.35 HECTARES) (TYP.)\nACCESS WITHOUT\nIMPACT TO\nTHE MAIN\nCHANNEL (TYP.)\nEXISTING CULVERT\nTO BE UPGRADED\nPROPOSED 12' WIDE\nACCESS DRIVE WITH TURNING\nRADIUS TO BE UPGRADED\n(TYP.)\nPROPOSED\nDOUBLE GATE\nUTILITY POINT OF\nINTERCONNECTION\nUTILITY METER\nAND MAIN\nSWITCH\nCLEARANCE\n1.5-2METER\nCLEARANCE WITHIN\nACCESS DRIVEWAY\nAND UTILITY ACCESSIBLE\nEASEMENT AREA\nAND DISCONNECT POLE\nPROPOSED\nELECTRIC LINE (TYP.)\nPROPOSED LANDSCAPE\nSCREENING (TYP.)\nPROPOSED SOLAR\nLANDSCAPE\nSCREENING (ASO\n(ASO GRASSING)\nTRANSFORMERS, INVERTERS,\nAND COMPONENTS REQUIRED\nTO CONVERT POWER (TYP.)\nPROPOSED\nPAD FOR\nTRANSFORMER\n(TYP.)\nPROPOSED SOLAR\nLANDSCAPE\nSCREENING (TYP.)\nNOTE:\nBRIGHT SURFACES,\nINCLUDING SOLAR PANELS\nTO BE LOCATED 100M\nFROM FENCE\n@chobertson\nArnie Beame - Chab...",
        "start_time": 420,
        "end_time": 439
      },
      {
        "text": "@cRobertson",
        "start_time": 540,
        "end_time": 560
      },
      {
        "text": "If there are any other questions or concerns please contact us\nAmee Bearne, Community Impact Manager\nAmee.Bearne@Chaberton.com\nNatalie Castro, Development Manager\nNatalie.Castro@Chaberton.com\nThank You",
        "start_time": 592,
        "end_time": 599
      },
      {
        "text": "If there are any other questions or concerns please contact us\nAmee Bearne, Community Impact Manager\nAmee.Bearne@Chaberton.com\nNatalie Castro, Development Manager\nNatalie.Castro@Chaberton.com\nThank You",
        "start_time": 600,
        "end_time": 607
      },
      {
        "text": "Law Judge",
        "start_time": 607,
        "end_time": 608
      },
      {
        "text": "Andy Flavin",
        "start_time": 612,
        "end_time": 613
      },
      {
        "text": "Law Judge",
        "start_time": 616,
        "end_time": 617
      },
      {
        "text": "Bob Sadzinski",
        "start_time": 618,
        "end_time": 619
      },
      {
        "text": "Bob Sadzinski",
        "start_time": 620,
        "end_time": 621
      },
      ...
    ]
  },
  "transcribe_config": {
    "enable_speech": true,
    "enable_visual_scene_description": true,
    "enable_scene_text": true,
    "enable_summary": true
  }
}

Let’s look at the the different transcriptions available from the JSON output.

  • title: The generated title of the video based on the transcriptions generated.
  • summary: A generated summary of the video based on the transcriptions generated.
  • speech: The speech transcription of the video.
  • visual_scene_description: A description of the scene at different timestamps in the video.
  • scene_text: The on-screen text at different timestamps in the video.

We also support markdown outputs, ideal for using in LLMs.

Markdown Example

The following is a truncated example of the markdown output.

Markdown Output
# Video Document

## Title

Maryland Public Service Commission Hearing on Chaberton Energy's Wild Turkey Community Solar Project

## Summary

This video documents a Maryland Public Service Commission hearing held on May 22, 2024, regarding the application for a certificate of public convenience and necessity for the Wild Turkey community solar project by Chaberton Energy. The hearing, presided over by Chief Public Utility Law Judge Chuck McLean, includes introductions of involved attorneys and state agency representatives, followed by a detailed presentation from Chaberton Energy representatives about the project's benefits, community engagement efforts, environmental considerations, and technical design. The project aims to provide renewable energy benefits, local tax revenue, and subscriber savings while ensuring environmental protection and compliance with state regulations. The session concludes with procedural updates and plans for future public hearings.

---

## Scenes

### Scene [00:00 - 00:20]

#### Video Conference Call

A video conference call between multiple participants.

**Visual Content:**

- [00:00 - 00:20]
  - The video begins with a black screen displaying the name "Doug".
  - The scene transitions to a man identified as Law Judge.
  - He is wearing a suit jacket.
  - Behind him are framed pictures on a white wall.
  - The scene changes to a man identified as Andy Flavin.
  - He is wearing a suit and tie and glasses.
  - Behind him is a wooden cabinet, a picture, and a telephone.
  - The scene returns to the man identified as Law Judge.
  - He looks down.
  - The scene transitions back to the black screen with the name "Doug".
  - The scene returns to the man identified as Law Judge.

**Speech:**

- [00:02 - 00:03] Alright.
- [00:03 - 00:05] Good morning, everybody.
- [00:05 - 00:06] Everybody hear me okay?
- [00:06 - 00:06] Yes,
- [00:08 - 00:08] your honor.
- [00:08 - 00:09] Excellent.
- [00:09 - 00:10] Alright.
- [00:10 - 00:10] Doug, we're
- [00:10 - 00:11] ready to go, sir?
- [00:12 - 00:12] Yes.
- [00:12 - 00:14] Can you hear me okay, your honor?
- [00:14 - 00:14] Yes, sir.
- [00:14 - 00:16] Let's go on the record, sir.
- [00:17 - 00:18] We are on the record.
- [00:18 - 00:19] Thank you very much.
- [00:19 - 00:20] Good evening, everyone.

**On-screen Text:**

- [00:00 - 00:20] "Doug" | "Law Judge" | "Andy Flavin"

---

### Scene [00:20 - 00:40]

#### Maryland Public Service Commission Hearing

Chuck McClean, the Chief Public Utility Law Judge at the Maryland Public Service Commission, fills in for Judge Christine Burke for a hearing on May 22, 2024.

**Visual Content:**

- [00:20 - 00:40]
  - A man with short graying hair, wearing a light brown suit jacket and a blue shirt, speaks directly to the camera.
  - He is centered in the frame, with a white wall behind him.
  - On the wall are two framed pictures and a display of small items.
  - The lighting is neutral, and the camera remains static throughout the scene.

**Speech:**

- [00:19 - 00:20] Good evening, everyone.
- [00:20 - 00:25] Today is 05/22/2024, and it's approximately 7PM.
- [00:25 - 00:26] My name is Chuck McLean.
- [00:26 - 00:38] I'm the chief public utility law judge at the Maryland Public Service Commission, and I am filling in for judge Christine Burke, uh, for the, uh, just this hearing, so you're not gonna be stuck with me at least for the duration of this particular case.

**On-screen Text:**

- [00:20 - 00:39] "Law Judge"

---

Looking for other categories of information from the video? Learn more about our extraction features and what they can do for you.

Key Features

  • Speech transcription: Speech to text transcription of video/audio files.
  • Scene text transcriptions: On screen text or captions from a video/audio file at different timestamps.
  • Visual scene transcripts: Get descriptions of different scenes in a video, at different timestamps.
  • Title + Summary: Get a generated title and summary of the video/audio file based on all the transcriptions we have available.
  • Markdown compatible: Our transcriptions are also able to generated with markdown, so you can use them in LLMs right away.

Try it out

Check out our Transcribe Video Endpoint to get started with building your own video/audio processing with Cloudglue. Get started on our platform.

YouTube

At the moment, if you want to transcribe a video directly from YouTube, we only support generating speech transcriptions.

If you would like to get the full spectrum of transcriptions for a YouTube video, you’ll need to download the video and upload it to Cloudglue.