How to Transcribe a Zoom Recording — Free, in 10 Seconds
Yes — any Zoom recording (cloud or local) can be transcribed in your browser. Drop the audio_only.m4a (or the .mp4) onto ScribeForge and Grok STT returns the transcript in 10–30 seconds with phrase-level timestamps. Speaker labels may appear when the voices are clearly separated. No code, no install, no signup.
Where Zoom saves your recording
| Type | Where it lives | Best file to use |
|---|---|---|
| Cloud recording | zoom.us/recording (sign in) | audio_only.m4a — click Download next to it |
| Local (Mac) | ~/Documents/Zoom/<date>/ | audio_only.m4a |
| Local (Windows) | %USERPROFILE%\Documents\Zoom\<date>\ | audio_only.m4a |
Use audio_only.m4a whenever it exists. It is roughly 10× smaller than the video.mp4 and produces an identical transcript — the video adds nothing for speech-to-text.
The 3-step workflow
- Open ScribeForge — go to scribeforge.tech in any browser. Mobile or desktop, both work.
- Drop the .m4a onto the upload zone. Files up to 100 MB are accepted — enough for many
audio_only.m4aZoom exports at default quality. - Click Transcribe. Grok STT returns the transcript in 10–30 seconds with phrase-level timestamps. Speaker labels may appear when the recording has clear separation. Copy, download as .txt, or upgrade for unlimited.
If your meeting is longer than 100 MB
Split it into chunks with ffmpeg:
ffmpeg -i meeting.m4a -f segment -segment_time 1500 -c copy chunk%03d.m4a
This produces 60-minute chunks (chunk000.m4a, chunk001.m4a…) that are easier to upload. Transcribe each chunk individually and concatenate the transcripts.
What to do with the transcript
- Search — Cmd+F across an hour of meeting audio in seconds.
- Summarize — paste the transcript into Grok chat, Claude, or GPT and ask for action items.
- Share — send the .txt file to teammates who missed the meeting.
- Compliance — keep the text record for legal or audit requirements without storing the audio file.
- Captions — phrase-level timestamps export to SRT or VTT for accessibility (export feature on the roadmap).
Common questions
Yes — both work. Grok STT extracts the audio track from .mp4 automatically. The .m4a is preferred only because it is smaller and uploads faster.
Sometimes. Grok may return speaker labels (Speaker 1, Speaker 2, …) when each voice is distinct enough, but they are not guaranteed on every Zoom recording. Timestamps are more reliable than speaker attribution today.
No. Audio is sent to xAI's Grok STT API for processing and deleted immediately. The text transcript exists only in your browser tab — close it and it is gone.
Only if they are above the 100 MB browser cap. Anything below 100 MB processes regardless of length. If denied for any other reason, the response includes the actual error from xAI.
Drop a Zoom recording and read it in 30 seconds.
Transcribe Zoom recording free →No account · No credit card · 2 free uses/day per IP