How to Transcribe Audio with Grok — No API Key, Free
No API key needed. ScribeForge is a browser wrapper around the xAI Grok STT API. You upload your audio, Grok transcribes it, and you get the text back — with timestamps and speaker labels. Two uses free every day, no signup.
Drop your audio file and transcribe it now.
Transcribe with Grok — free →No account · No API key · 2 free uses/day per IP
Why you can use Grok STT without an API key
The Grok Speech-to-Text API launched on April 18, 2026. Using it directly requires an xAI account, API access, and writing multipart upload code. ScribeForge handles all of that — you get the same Grok STT output in a browser drag-and-drop interface.
Under the hood, ScribeForge calls POST https://api.x.ai/v1/stt with the scribe_v2 model, parses the word-level timestamps, groups them into readable segments, and shows them in your browser. The audio is processed and then immediately discarded — nothing is stored.
Supported audio formats
| Format | Extension | Typical source |
|---|---|---|
| MP3 | .mp3 | Podcasts, voice recordings, downloaded audio |
| WAV | .wav | Studio recordings, interviews, dictation |
| M4A | .m4a | iPhone voice memos, Zoom/Teams audio exports |
| MP4 | .mp4 | Screen recordings, YouTube downloads |
| OGG / Opus | .ogg/.opus | WhatsApp voice notes, browser recordings |
| FLAC | .flac | Lossless audio, archival recordings |
| AAC | .aac | Android voice notes, streaming captures |
| MKV | .mkv | Video calls, game recordings |
All 12 Grok STT formats are supported. File size limit: 25 MB per upload — roughly 25 minutes of mono MP3 at 128 kbps.
Step-by-step: transcribe audio with Grok in 3 steps
- Open scribeforge.tech in any browser — desktop or mobile, no install.
- Drop your audio file onto the upload zone, or click to browse. Any format from the table above.
- Read the transcript. Grok STT returns the full text in 10–30 seconds, with phrase-level timestamps and per-speaker labels. Copy or download as .txt.
What Grok STT returns
- Full transcript — verbatim text of everything spoken.
- Phrase-level timestamps — each sentence anchored to its start time in the audio.
- Speaker diarization — automatic Speaker 1 / Speaker 2 labels when multiple voices are present.
- Language detection — Grok auto-detects among 25+ languages with no input from you.
Common questions
No. ScribeForge provides access to Grok STT without requiring you to sign up for an xAI account or manage an API key. Free tier: 2 transcriptions per day per IP. For unlimited use, a paid plan is available.
No. The Grok chat interface does not accept audio uploads — it is a text and image interface. The Grok STT API (POST /v1/stt) is a separate product launched April 18, 2026. ScribeForge uses that API.
On xAI's phone-call benchmark, Grok reports a 5.0% word error rate — versus ElevenLabs at 12.0% and Deepgram at 13.5%. For general speech (podcasts, meetings, voice memos), accuracy is comparable to Whisper large-v3.
No. Audio is forwarded to xAI's Grok STT API for processing and deleted immediately after the transcript is returned. ScribeForge does not store audio or transcripts beyond your browser tab.
Split it with ffmpeg: ffmpeg -i input.mp3 -f segment -segment_time 1500 -c copy chunk%03d.mp3. This produces 25-minute chunks you can transcribe individually.
Ready to transcribe your audio with Grok?
Open ScribeForge — free →No account · No API key · No credit card