Grok STT Online
Free Grok speech-to-text in your browser — no xAI account, no API key, no code. Powered by xAI scribe_v2.
What is Grok STT?
Grok Speech-to-Text is xAI's audio transcription API, built on the same voice stack that powers Grok Voice, Tesla in-car voice, and Starlink customer support. It was released on April 18, 2026 as a standalone API at POST https://api.x.ai/v1/stt.
It is not accessible through the Grok chat interface — the chat does not accept audio files. ScribeForge is a browser wrapper that calls the API on your behalf, so you get Grok STT without writing any code or creating an xAI account.
Supported formats
| Format | Extension | Common source |
|---|---|---|
| MP3 | .mp3 | Podcasts, downloaded audio, voice recorders |
| WAV | .wav | Studio recordings, legal dictation |
| M4A | .m4a | iPhone voice memos, Zoom/Teams audio |
| MP4 | .mp4 | Screen recordings, video calls |
| OGG / Opus | .ogg/.opus | WhatsApp voice notes, browser recordings |
| FLAC | .flac | Lossless/archival audio |
| AAC | .aac | Android recordings, streaming captures |
| MKV | .mkv | Game clips, Loom recordings |
How to use Grok STT online — 3 steps
- Go to scribeforge.tech — works in any browser, desktop or mobile.
- Drop your audio file onto the upload zone. Any format from the table above, up to 25 MB.
- Get the transcript. Grok STT processes in 10–30 seconds and returns full text with timestamps and speaker labels. Copy or download as .txt.
Grok STT vs other online STT tools
| Tool | Engine | Free tier | API key required | Speaker labels |
|---|---|---|---|---|
| ScribeForge | Grok scribe_v2 | 2/day | No | Yes |
| Whisper.net | OpenAI Whisper | Limited | No | No |
| Deepgram Console | Nova-2 | $200 credit | Yes | Yes |
| AssemblyAI Playground | Universal-2 | 5h/month | Yes | Yes |
Common questions
Yes. Grok STT auto-detects the language from 25+ supported languages including Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese (Mandarin), Arabic, and more. No language selector required.
On xAI's phone-call benchmark, Grok scribe_v2 scores 5.0% WER — versus ElevenLabs at 12.0%, Deepgram at 13.5%, and AssemblyAI at 21.3%. For general speech (podcasts, meetings), accuracy is comparable to Whisper large-v3.
No. The file is forwarded to xAI's API for transcription and deleted immediately. ScribeForge does not store audio or text beyond your browser session.
After 2 free daily transcriptions, ScribeForge shows a paywall. A one-time pack (50 transcriptions) or a monthly unlimited plan is available. The daily limit resets at midnight UTC.
Use Grok STT online — free, no account needed.
Open Grok STT →No account · No API key · No credit card · 2 free/day