YouTube's auto-generated captions exist but they're unreliable, unavailable on many videos, and can't be exported as clean text. The better approach: download the audio and transcribe it yourself in under two minutes. Here's the full workflow.
You need the audio as an MP3 or M4A file. yt-dlp is the most reliable tool for this — it's free, open source, and works on Windows, Mac, and Linux.
# macOS (with Homebrew)
brew install yt-dlp
# Linux
sudo pip install yt-dlp
# Windows — download the .exe from github.com/yt-dlp/yt-dlp/releases
yt-dlp -x --audio-format mp3 "https://www.youtube.com/watch?v=VIDEO_ID"
This downloads only the audio track and converts it to MP3. The file lands in your current directory with the video title as the filename.
If you want a specific output name:
yt-dlp -x --audio-format mp3 -o "transcript-source.mp3" "https://youtube.com/watch?v=VIDEO_ID"
Go to scribeforge.tech — no account needed.
Click "Upload audio" or drag the file onto the upload area. ScribeForge accepts MP3, M4A, WAV, OGG, FLAC, WEBM, MP4 — up to 25 MB per file.
Processing takes 10–30 seconds depending on file length. A 10-minute video typically finishes in under 20 seconds.
The text appears with paragraph breaks and timestamps. Click "Copy" or download as .txt.
A 25 MB MP3 is roughly 25 minutes of speech at 128 kbps — most YouTube videos fit. For longer videos, you can split the audio into chunks with ffmpeg:
# Split into 20-minute segments
ffmpeg -i video.mp3 -f segment -segment_time 1200 -c copy part%03d.mp3
Transcribe each part separately and join the text. The timestamps in each segment start from 0, so you'll need to add the offset manually if exact timing matters.
| YouTube auto-captions | ScribeForge + yt-dlp | |
|---|---|---|
| Available on all videos? | No — creator must allow it | Yes (if you can download) |
| Accuracy | Mediocre on accents/technical terms | High (scribe_v2 model) |
| Export as clean text | No (only VTT/SRT) | Yes — plain .txt |
| Timestamps | Yes | Yes (phrase-level) |
| Speaker separation | No | No (single speaker per segment) |
| Account needed | No (viewing), yes (download) | No |
If you're a creator and want to transcribe your own videos: download the original recording (before upload), not the YouTube version. The original will be higher quality and give you a more accurate transcript. Then publish the transcript alongside your video for SEO — Google indexes it and your video starts ranking for long-tail keywords in the transcript text.
Got your MP3? Transcribe it now — free, no account, results in seconds.
Transcribe Audio Free →No account · No credit card · Try free