Not all AI products are overhyped; there are some genuinely useful ones, from writing your e-mail to editing your photos to running AI agents and much more.
I have been constantly testing various AI tools, signing up for new ones, trying beta software, and evaluating them against existing competitors to assess their performance.
While navigating this AI world, it is our responsibility to provide useful AI tools that genuinely benefit our readers, and we have been doing so. Our recent AI agents guide is one of them, featuring 8 useful AI agents that can be beneficial for everyday users.
Whisper is one such tool that allows you to transcribe audio into text; it also features useful capabilities, such as captions, and can even generate subtitles. While most platforms now offer subtitles, however, there are still plenty of shows that don’t have subtitles, which makes them extremely hard to understand and often leads to not watching. Additionally, thanks to dubbing, most studios launch shows with limited language options.
How to Generate Subtitles and Captions for Any Video Using Whisper AI
For this, I am using Whisper, a free audio transcribe tool from OpenAI available for both Mac and Windows. It supports multiple languages and can even translate to other languages. It can transcribe text, generate text from audio, and convert it into other languages; most importantly, it can generate subtitles and captions from any language and even translate them into English.
Prerequisites to Install and Use Whisper on Your Mac, Windows, and Linux
You can run Whisper AI in the cloud and locally on your Mac. Since we are discussing a free option, running it locally on your device is the only free option available. But before that, here are the prerequisites for each platform to run it locally on your device.
Windows
- Python 3.8+
- PyTorch 1.10.1+
- ffmpeg
mac
- Python 3.8+
- PyTorch 1.10.1+
- ffmpeg Linux
- Python 3.8+
- PyTorch 1.10.1+
- ffmpeg
How to Install Whisper AI Locally on Your Device
Installing Whisper is easy, and it requires three steps, depending on your device. I have listed the method for both Windows and macOS. You can follow the steps to install Whisper locally on your device without any issues. If you’re ready, let’s get started.
Step 1: Install Python on your device
If you already have Python installed on your device, you can skip this step. If you don’t, visit: https://www.python.org/ and install it on your device.
Once installed, use the following command to check if it’s installed on your device.
Open your command prompt (Windows) or terminal (Mac/Linux) and type the following command: python version.
If it shows the Python version, something like this, Python 3.12.0, that means it is successfully installed on your device. If you dont see this, follow the process again.
If you have an older version of Python, you can update it using the following commands on Windows, Mac and Linux.
- Windows: python -m pip install –upgrade pip
- MacOS : python3 -m pip install –upgrade pip
- Linux: sudo apt install python3.12
Step 2: Now, Install Whisper
Next, install Whisper on your device. Open the terminal on your device and use this command.
- Windows: pip install -u openai-whisper
- MacOS: pip install -u openai-whisper. If you see any error, use this command: python3 -m pip install –user -U openai-whisper
- Linux: pip install -u openai-whisper
Wait until all files are downloaded successfully. Once completed, you will see a message similar to ‘Successfully downloaded.’
You can use this command to check if it’s installed correctly on your device.
Command: whisper –help
Step 3: Next, install the FFmpeg software.
Next, install ffmpeg (required for audio) to process the audio files on your device.
On Mac:
You can use Homebrew to install it on your Mac using this simple command: brew install ffmpeg
If you don’t have Homebrew on your Mac:
Step 1: Open Terminal on your Mac
Step 2: Paste this command and press enter: /bin/bash -c “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)”
Step 3: Enter the Password and wait until all files are downloaded.
Related: How to install Homebrew on your macOS
Step 4: Use this command to check if Brew was installed correctly on your device. You can use this command: brew –version
Install the ffmpeg software
Step 1: Open Terminal and use this command: brew install ffmpeg
Step 2: Wait until all required files have been downloaded.
Step 3: Use this command to check if it’s successfully installed on your device: ffmpeg -version
For Windows:
Step 1: Go to https://ffmpeg.org/download.html and download the Windows file.
Step 2: Extract it and install it on your device like any other software.
Step 3: Next, verify that everything is installed correctly on your device. Open the terminal on your device and enter the following command: ‘ whisper help’. If you see options, it means the app is installed correctly on your device.
How to Generate Subtitles for Video
Step 1: Open the terminal on your device and go to the folder where the video file is located. In this case, my video file is located on my desktop. You can use the following commands to navigate to the video file folder.
- Windows: cd Desktop
- MacOS: cd ~/Desktop
Step 2: Transcribe the audio and video files. Here is the list of file formats that Whisper supports.
| Video Formats Supported | Audio Formats Supported | 
| .mp4 | .mp3 | 
| .mkv | .wav | 
| .mov | .flac | 
| .webm | .m4a | 
| .avi | .aac | 
| .mpg / .mpeg | .ogg | 
| .flv | .opus | 
| .wmv | .aiff / .aif | 
| .3gp | .amr | 
| .wma | 
Step 3: Now, transcribe select the video file using this command: whisper (add your video or audio file name here) –model turbo –task transcribe
example: whisper video.mp4 –model turbo –task transcribe
Step 3: Depending on your model selection, a new model may be downloaded. So wait until it finishes downloading. Here are the various Whisper models you can choose from. The lower the model, the lower the accuracy.
| Model Name | Parameters | Required VRAM | 
| tiny | 39M | 1 GB | 
| base | 74M | 1 GB | 
| small | 244M | 2 GB | 
| medium | 769M | 5 GB | 
| large | 1.55B | 10 GB | 
| turbo | 809M | 6 GB | 
However, if your device isn’t powerful, I recommend using the tiny or base model, as running large models can consume all your system resources. If you have larger media files and want high accuracy, you can run Whisper on Google Colab. Here is the video tutorial on how to.
Step 4: Subtitles are now generated. You can find the following files in the video folder you specified before.
- video. txt plain text transcript
- video. vtt web subtitle format
- video. srt common subtitle format (for YouTube, VLC, etc. )
Step 5: Open the file to view subtitles. Now go to the folder where your videos are located and open the file that says video.srt
Step 6 (Optional): If the video is not in English, you can translate it into English using the following command.
Command: whisper (your video or audio file name) –model medium –language (enter your language here) –task translate
Example: whisper video.mp4 –model medium –language Japanese –task translate
Step 7: Now, return to the destination where the video file is located. From there, you can see the video.SRT file, which contains English subtitles for the video you used.
How to Use Subtitles for the Video
You can import subtitles to any third-party media player and play the video with subtitles. VLC is the best third-party media player available for both Windows and macOS. Here’s how to use it to watch movies with subtitles you’ve generated using Whisper.
Download VLC media player (free)
- For Windows
- For macOS
Step 1: Once downloaded, install the VLC media player on your device.
Step 2: Open the VLC Player on your Windows and navigate to the Playlist tab, then click on Open Media.
Step 3: From here, tap on the browse and select the video file and enable Add Subtitle File, select the Subtitle file you generated using Whisper AI, and click on Open.
Step 4: That’s it, subtitles for the video are generated.
Final Thoughts
Over the years, these limitations have frustrated me. However, VLC has announced AI-generated captions that work in real-time, but they’re still not available to regular users. And even more concerning, it was their last post on Twitter, so I am losing hope until I figure out Whisper.
This is how you can generate subtitles for any video, including full-length movies, depending on your device’s resources. If your device isn’t powerful, you can run Whisper AI on Google Colab. Not just generating subtitles, you can also translate them into other languages, as shown in the video below. I hope you find this guide helpful. I ensured that each step is detailed to make this guide beginner-friendly. If you still have any issues while setting up or running the Whisper AI locally on your device. You can comment below.
FAQs on Generating Subtitles Using Whisper AI
1. Can I use Whisper to automatically add subtitles to YouTube videos I upload?
No, you need to generate subtitles if the video is stored locally on your device and then upload it to YouTube when you upload the video. It doesn’t automatically add subtitles to the YouTube videos you upload.
2. Does Whisper work offline after installation?
Yes, once the required model is downloaded, it operates offline, eliminating the need for an internet connection.
3. How can I speed up transcription on older or low-end devices?
To speed up transcription, you need to use small models. You can also compress the large video files into smaller ones to make this process faster. To compress files, you can use the following command in the terminal: ffmpeg -i video. mp4 -q: a 0 -map a audio. mp3
4. Can Whisper detect speakers or separate dialogues automatically?
No, currently Whisper cannot detect speakers or separate dialogues automatically; you need to use separate tools, such as Pyannote, to obtain separate audio files, Pyannote audio or WhisperX
5. Are Whisper-generated subtitles accurate for noisy or accented speech?
Yes, it is one of the best transcription models. Accuracy also depends on the model you are using; large models generally generate better accuracy; however, they can take more time and require powerful resources to run.















