How to Transcribe a Zoom Recording — Free, in 10 Seconds
Yes — any Zoom recording (cloud or local) can be transcribed for free in your browser. Drop the audio_only.m4a (or the .mp4) onto ScribeForge and Grok STT returns the transcript in 10–30 seconds, with phrase-level timestamps and per-speaker labels. No code, no install, no signup.
Where Zoom saves your recording
| Type | Where it lives | Best file to use |
|---|---|---|
| Cloud recording | zoom.us/recording (sign in) | audio_only.m4a — click Download next to it |
| Local (Mac) | ~/Documents/Zoom/<date>/ | audio_only.m4a |
| Local (Windows) | %USERPROFILE%\Documents\Zoom\<date>\ | audio_only.m4a |
Use audio_only.m4a whenever it exists. It is roughly 10× smaller than the video.mp4 and produces an identical transcript — the video adds nothing for speech-to-text.
The 3-step workflow
- Open ScribeForge — go to scribeforge.tech in any browser. Mobile or desktop, both work.
- Drop the .m4a onto the upload zone. Files up to 25 MB are accepted — about 25 minutes of
audio_only.m4aat default Zoom quality. - Click Transcribe. Grok STT returns the transcript in 10–30 seconds with phrase-level timestamps and speaker labels. Copy, download as .txt, or upgrade for unlimited.
If your meeting is longer than 25 MB
Split it into chunks with ffmpeg:
ffmpeg -i meeting.m4a -f segment -segment_time 1500 -c copy chunk%03d.m4a
This produces 25-minute chunks (chunk000.m4a, chunk001.m4a…) that each fit under the 25 MB cap. Transcribe each chunk individually and concatenate the transcripts.
What to do with the transcript
- Search — Cmd+F across an hour of meeting audio in seconds.
- Summarize — paste the transcript into Grok chat, Claude, or GPT and ask for action items.
- Share — send the .txt file to teammates who missed the meeting.
- Compliance — keep the text record for legal or audit requirements without storing the audio file.
- Captions — phrase-level timestamps export to SRT or VTT for accessibility (export feature on the roadmap).
Common questions
Yes — both work. Grok STT extracts the audio track from .mp4 automatically. The .m4a is preferred only because it is smaller and uploads faster.
Yes. Grok returns automatic diarization (Speaker 1, Speaker 2, …) per phrase. Accuracy depends on each speaker having a distinguishable voice and minimum few seconds of speech.
No. Audio is sent to xAI's Grok STT API for processing and deleted immediately. The text transcript exists only in your browser tab — close it and it is gone.
Only if they are above the 25 MB browser cap. Anything below 25 MB processes regardless of length. If denied for any other reason, the response includes the actual error from xAI.
Drop a Zoom recording and read it in 30 seconds.
Transcribe Zoom recording free →No account · No credit card · 2 free uses/day per IP