How to Transcribe Audio with Grok — No API Key, Free
No API key needed. ScribeForge is a browser wrapper around the xAI Grok STT API. You upload your audio, Grok transcribes it, and you get the text back with timestamps. Speaker labels may appear when the recording allows clear separation. Two uses free every day, no signup.
Drop your audio file and transcribe it now.
Transcribe with Grok — free →No account · No API key · 2 free uses/day per IP
Why you can use Grok STT without an API key
The Grok Speech-to-Text API launched on April 18, 2026. Using it directly requires an xAI account, API access, and writing multipart upload code. ScribeForge handles all of that — you get the same Grok STT output in a browser drag-and-drop interface.
Under the hood, ScribeForge calls POST https://api.x.ai/v1/stt with the scribe_v2 model, parses the word-level timestamps, groups them into readable segments, and shows them in your browser. The audio is processed and then immediately discarded — nothing is stored.
Supported audio formats
| Format | Extension | Typical source |
|---|---|---|
| MP3 | .mp3 | Podcasts, voice recordings, downloaded audio |
| WAV | .wav | Studio recordings, interviews, dictation |
| M4A | .m4a | iPhone voice memos, Zoom/Teams audio exports |
| MP4 | .mp4 | Screen recordings, YouTube downloads |
| OGG / Opus | .ogg/.opus | WhatsApp voice notes, browser recordings |
| FLAC | .flac | Lossless audio, archival recordings |
| AAC | .aac | Android voice notes, streaming captures |
| MKV | .mkv | Video calls, game recordings |
All 12 Grok STT formats are supported. File size limit: 100 MB per upload — roughly 100 minutes of mono MP3 at 128 kbps.
Step-by-step: transcribe audio with Grok in 3 steps
- Open scribeforge.tech in any browser — desktop or mobile, no install.
- Drop your audio file onto the upload zone, or click to browse. Any format from the table above.
- Read the transcript. Grok STT returns the full text in seconds, with phrase-level timestamps. Speaker labels may appear when the recording allows clear separation. Copy or download as .txt.
What Grok STT returns
- Full transcript — verbatim text of everything spoken.
- Phrase-level timestamps — each sentence anchored to its start time in the audio.
- Speaker labels — may appear as Speaker 1 / Speaker 2 when multiple voices are clearly separated.
- Language detection — Grok auto-detects among 25+ languages with no input from you.
Common questions
No. ScribeForge provides access to Grok STT without requiring you to sign up for an xAI account or manage an API key. Free tier: 2 transcriptions per day per IP. For unlimited use, a paid plan is available.
No. The Grok chat interface does not accept audio uploads — it is a text and image interface. The Grok STT API (POST /v1/stt) is a separate product launched April 18, 2026. ScribeForge uses that API.
xAI has published a 5.0% phone-call benchmark for Grok STT, but real-world quality depends on microphones, overlap, background noise, and compression. For important material, review the transcript before relying on it.
No. Audio is forwarded to xAI's Grok STT API for processing and deleted immediately after the transcript is returned. ScribeForge does not store audio or transcripts beyond your browser tab.
Split it with ffmpeg: ffmpeg -i input.mp3 -f segment -segment_time 3600 -c copy chunk%03d.mp3. This produces 60-minute chunks you can transcribe individually.
Ready to transcribe your audio with Grok?
Open ScribeForge — free →No account · No API key · No credit card