Oct 6, 2023

Summarize a Video with LLM: a Tutorial

A short tutorial on downloading YouTube transcripts, restoring punctuation, and using the OpenAI API to summarize or query a video.

This short tutorial will guide you how to query and summarize YouTube Videos with LLMs. It will guide you through using the OpenAI API and a punctuation restoration model to summarize a YouTube video.

It is not only an easy, quick, and cheap way to summarize YouTube videos, but also a good lesson on how to work with ASR transcriptions and how to query LLMs through example API.

What tools you will learn

In this tutorial you will how use three separate set of tools:

  • YouTube Transcript API: You will learn how to download transcripts from YouTube videos.
  • Rpunct Library: You will learn how to transform raw ASR transcripts into grammatically correct, reader-friendly content.
  • OpenAI API: Pass the punctuated text to OpenAI models using API and get the desired answer.

So... Let's start!

First we will have to install all the requirements. We will install the Youtube Transcript API, Rpunct library that will be used for punctuation prediction, and OpenAI API.

bash
                pip install youtube_transcript_api git+https://github.com/babthamotharan/rpunct.git@patch-2 openai
              

And that's it!

Let's download the YouTube video

Now, we will use YouTube API to download automatically generated subtitles from YouTube videos (ASR transcripts). As you will see, those generated transcripts, similarly to commonly used other ASR outputs, does not have punctuation.

We will try it with BugBytes video on EmbedChain: https://www.youtube.com/watch?v=IVfcAgxTO4I

python
                from youtube_transcript_api import YouTubeTranscriptApi

def get_video_id(url_link):
  return url_link.split("watch?v=")[-1]

video_link = "https://www.youtube.com/watch?v=IVfcAgxTO4I"

transcript = YouTubeTranscriptApi.get_transcript(get_video_id(video_link))
transcript_joined = " ".join([line["text"] for line in transcript])
              

This is what we got:

in this video we're going to look at a new package called embedging and this allows you to very easily create language model powered Bots over any data set in this introduction video we're going to show how to use this library but in future videos we're going to build (...)

Let's add punctuation

Because the original ASR transcripts do not have any punctuation, they are difficult to read. We will try to restore it. Additionally, LLMs usually expect that you provide text with punctuation. Providing text with no punctuation might result in errors.

The Rpunct library uses BERT model for the token classification to predict the punctuation and letter's capitalization.

python
                from rpunct import RestorePuncts

rpunct = RestorePuncts()
results = rpunct.punctuate(transcript_joined)
print(results)
              

The text with predicted punctuation:

In this video, we're going to look at a new package called Embedging and this allows you to very easily create language Model powered Bots over any data set. In this introduction video, we're going to show how to use this library, but in future videos, we're going to build some tools around embedging and I can elaborate more at the end of the video on what we might build. (...)

As we can see, it is much more human-readable now.

Query OpenAI GPT-3.5 Turbo

Finally, we can ask OpenAI model to summarize text, ask questions or to even write a blog post about this video.

Now you have to find and add your API token. Here's how to find it:

  • Visit the OpenAI website at https://www.openai.com and log in.
  • Once logged in, navigate to your account settings. Look for the "API Tokens" section.
  • Generate a new API token if you don't have one, or copy your existing API token.

You'll ensure that you have the necessary access to use the OpenAI API for your video summarization task.

python
                import openai

prompt = f"Summarize this text: \ntext = {transcript_joined}"

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {
      "role": "user",
      "content": prompt
    }
  ],
  temperature=1,
  max_tokens=256,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)
              

Generated summary: The text is a detailed explanation of how to use the embed chain package to create language model-powered Bots. The author walks through the process of adding resources to the bot, querying the language model, and obtaining responses based on the provided context. The author demonstrates using web pages, YouTube videos, and a transcribed travel video as resources for the bot. The text also mentions the use of the openai language model and the chroma vector database. The author suggests potential future videos on vector databases and building a chatbot in a web application.

If you prefer you can summarize text in different formats

python
                prompt = f"Summarize this text using bulletpoints: \ntext = {transcript_joined}"

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {
      "role": "user",
      "content": prompt
    }
  ],
  temperature=1,
  max_tokens=256,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)
              

Generated summary in bullet points:

  • The text discusses a new package called "embedchain" that allows users to create language model powered bots over any dataset.
  • The package helps with loading datasets, creating embeddings, and storing them in a vector database.
  • The code demonstrates how to add resources to the bot, such as web pages, YouTube videos, and PDF files.
  • Users can query the bot with a prompt and receive a response generated by the language model.
  • The text mentions the use of the OpenAI language model and the importance of having an API key stored in a .env file.
  • The author suggests installing the python-dotenv and embedchain libraries and using Visual Studio Code to run the code.
  • Examples of queries and responses are provided.

Asking questions

Similarly, with a little prompt engineering you can even asks questions about the video, e.g. about the processes you didn't understand or to clarify some points.

Summary

In this tutorial you learned how to simply download ASR transcripts, add punctuation to it and use it together with LLM to summarize and query youtube videos.

If you're interested in more tutorials and news like this subscribe to KeyGen.

You will find the Google Colab note book here.

← AI explained