This free tool uses AI to transcribe your video, burn karaoke-style captions, add a title card, and animate a closing CTA — all on your own Windows computer. No subscription. No monthly fees. Download it and it’s yours forever.
Drop your name and email below and I'll send you the tool. I may occasionally share useful updates and podcast growth tips — unsubscribe any time.
Your download is starting automatically.
If it doesn't begin, use the button below.
Open the zip → read README.txt → run install_deps.py
WhisperX generates word-level timestamps so each word highlights exactly when it's spoken.
Bold Bayon font, white text, black outline. The current word turns cyan; everything else stays white.
Black rounded badge for the first 4 seconds. Font auto-sizes to fill the frame.
Two pill badges pop in for the last 4 seconds: "Follow for more" + your custom channel tagline.
Add a Claude API key (~$0.003/video) to auto-generate titles, descriptions, and tags.
Free YouTube Data API integration uploads your finished video as a private draft.
Most video captioning tools are cloud-based, charge monthly subscriptions, and require you to upload your files to someone else's server. This tool is different: it runs entirely on your own Windows computer using WhisperX for AI transcription, so your content never leaves your machine.
Whether you're repurposing podcast episodes into TikTok clips, YouTube Shorts, or Instagram Reels, this tool handles the tedious captioning work so you can focus on creating great content.
Is this really free? What's the catch?
Yes, it's genuinely free! No trial period. No feature limits. No watermarks. I built this tool to solve my own podcast clip workflow and decided to share it. I don't have a newsletter, but if I share more free tools or podcast tips in the future, you'll be first to know. You can opt out of hearing from me anytime.
What are the system requirements?
Windows 10 or 11 with Python 3.10+. An NVIDIA GPU is recommended for faster transcription via CUDA acceleration, but the tool works on CPU-only machines too (just slower). The installer handles all dependencies automatically.
Will this be available for macOS?
Probably not from me any time soon. I built this tool for my own Windows-based podcast editing workflow for my award-winning video podcast, Crafty Brewers, and I edit on Windows. That said, if you're on a Mac, feel free to upload the tool's source files to Claude and ask it to adapt them for macOS.
Does my video get uploaded to the cloud?
No. Everything runs locally on your computer. Your video files, transcriptions, and finished output never leave your machine. The only optional network call is the YouTube upload feature, which you control.
What caption style does it use?
Bold Bayon font with white text and black outline. The current spoken word highlights in cyan while everything else stays white. It's a karaoke-style effect that keeps viewers reading along. It's designed for vertical (9:16) video but technically works with any aspect ratio (although it may not look pretty).
Can I customize the captions, title card, or CTA?
Yes. The configuration file lets you set your channel name, custom tagline for the CTA overlay, title text, and more. The tool generates everything from your settings each time you run it.
How is this different from VEED, Captions, or CapCut?
Those tools run in the cloud; your video is uploaded to their servers, processed there, and often watermarked unless you pay. This tool runs entirely on your own Windows computer. Nothing is uploaded anywhere (unless you alter the tool). You own the process, the output, and the tool itself. And you pay nothing, ever.
Will this work on my horizontal video, too?
Technically yes, but the captions will look gigantic on a horizontal (16:9) video. I built this tool specifically for 9:16 vertical video, so the captions, title card sizing, and CTA layout are all calibrated for vertical. Feel free to test it to see for yourself!
What video formats does it support?
The tool accepts any video format that FFmpeg can read, which covers virtually everything: MP4, MOV, MKV, AVI, WebM, and more. Output is always MP4 (H.264), optimized for vertical platforms like TikTok, YouTube Shorts, and Instagram Reels.
How accurate is the transcription?
Very. The tool uses WhisperX, an optimized version of OpenAI's Whisper model, which consistently outperforms most cloud-based captioning services on natural spoken audio. For podcast content — clear speech, consistent audio quality — accuracy is typically 95%+. A text editor is included so you can fix any errors before burning captions to the video.
How long does it take to caption a video?
On a machine with an NVIDIA GPU, a typical 60-second clip takes under two minutes end-to-end: transcription, caption burn, title card, and CTA animation. On CPU-only it's slower (roughly 3–5x) but still fully functional.
Who made this?
I’m Cody Gough, a podcast growth strategist who has managed $1M+ in podcast advertising budgets and driven $4M+ in measurable revenue. I built this to streamline my own short-form video workflow for my award-winning video podcast, Crafty Brewers, and decided to share it with the community.