xAI API · April 2026

xAI Grok Text-to-Speech MP3 Download — API Guide

By ScribeForge · April 20, 2026 · 5 min read
Note: ScribeForge offers audio transcription (STT) only. This article covers the xAI Grok TTS API for developers who want to generate speech in their own apps. Try ScribeForge STT →

Most text-to-speech tools let you play audio in the browser but hide the download behind a paywall. This guide covers how xAI Grok TTS works, what the voices sound like, and how to get audio as an MP3 using the Grok TTS API. If you need audio transcription instead, ScribeForge converts any audio file to text — free, no account, no watermark.

Need to transcribe audio to text? Upload your file and get a transcript — free, no sign-up.

Transcribe audio free →

How to generate and download TTS audio with the Grok API

1
Get an xAI API key

Sign up at console.x.ai and create an API key. The TTS endpoint is POST https://api.x.ai/v1/tts.

2
Send your text and voice choice

POST a JSON body with text, voice_id (e.g. "Eve"), and an output_format object specifying codec: "mp3". See the full API guide for code examples.

3
Pick a voice

Choose from 5 Grok TTS voices: Eve, Ara, Rex, Sal, or Leo. Each has a distinct character — see the full voice guide to pick the right one.

4
Save the raw MP3 response

The API returns raw MP3 bytes — write them directly to a file. No post-processing needed. The audio is clean, no watermark.

Available voices

Eve Warm · Female
Ara Calm · Female
Rex Bold · Male
Sal Neutral
Leo Deep · Male

Eve is the most natural-sounding for general content. Rex and Leo work well for authoritative or cinematic narration. Ara is best for soft, educational content. Sal is the most neutral — good when you want the voice to "get out of the way." Read the voice comparison guide for audio samples and use-case recommendations.

What can you do with the downloaded MP3?

Grok TTS API pricing

xAI charges approximately $0.015 per 1,000 characters of input text. There is no free tier on the API — you pay per character. For light use (a few hundred characters at a time), costs are fractions of a cent per generation.

For regular use — weekly podcast episodes, ongoing narration projects — budget accordingly based on your average text length per generation.

Audio quality and format

The Grok TTS API returns MP3 at 128 kbps, which is CD-quality for speech. The response body is raw MP3 bytes — there's no post-processing needed. What the API returns is what you save.

If your use case requires WAV or a different bit rate, convert after downloading using Audacity (free) or ffmpeg:

ffmpeg -i input.mp3 -ar 44100 -ac 2 output.wav

How does Grok TTS compare?

ToolFree download?No account?VoicesMax chars (free)
xAI Grok TTS (API)YesNo (API key)5
ElevenLabsYes (watermarked)No1000+10,000/month
OpenAI TTSNo (API only)No6
Google TTSNo (API only)No40+
NaturalReader (web)NoNo

Related reading

Need to transcribe audio to text instead? ScribeForge does it free — no account, no API key needed.

Try free transcription →

No account  ·  No credit card  ·  Try free