How to Transcribe a Discord Call — Free
Yes — any Discord voice call can be transcribed for free, but Discord itself doesn't record. The standard combo is Craig (free recording bot) → ScribeForge (transcription). Craig captures multi-track audio (one file per speaker), and ScribeForge transcribes each track in 10–30 seconds. Result: a perfectly labeled transcript with no diarization guesswork.
Why Craig + ScribeForge beats single-file recording
The naive approach is to use OBS Studio or system audio recording — gives you one mixed audio file with all speakers overlapping. Grok STT can diarize this (Speaker 1, Speaker 2…) but accuracy drops on noisy multi-speaker mixes.
Craig records each speaker on a separate track. You get one FLAC per Discord username. Transcribe each FLAC individually → automatic per-speaker labeling, no diarization needed. Concatenate by timestamp at the end.
The 5-step workflow
- Add Craig — visit craig.chat and invite the bot to your server. Admin permissions required. Free tier is enough for casual calls.
- Start recording — in your voice channel, type
:craig:, join. Craig joins as a participant and starts capturing. Announce the recording to all participants. - End recording — type
:craig:, leaveor end the call. Craig DMs the recording-host a download link. - Pick "Multiple FLAC tracks" — one file per speaker, lossless. Filename includes the Discord username, so you already know who is who.
- Transcribe each track — drop each .flac into ScribeForge. Result is a clean per-speaker transcript. Combine by timestamp at the end.
Compress FLAC before upload (25 MB cap)
FLAC is lossless and large. A 1-hour call produces 30–100 MB per speaker — over the 25 MB cap. Convert to OGG Opus at 32 kbps mono first:
ffmpeg -i speaker.flac -c:a libopus -b:a 32k -ac 1 speaker.ogg
32 kbps Opus mono is plenty for transcription accuracy — the human voice fits in that bitrate cleanly. Result is 5–10× smaller, fits well under the cap.
Combining per-speaker transcripts
Each ScribeForge transcript includes phrase-level timestamps (start, end, text). For a unified call transcript:
- Save each transcript as
speaker-{username}.json(download includes timestamps). - Merge into a single timeline by start-time, prefixing each line with the speaker name.
- One Python snippet does this:
import json, glob
events = []
for fn in glob.glob("speaker-*.json"):
speaker = fn.replace("speaker-", "").replace(".json", "")
for seg in json.load(open(fn))["segments"]:
events.append((seg["start"], speaker, seg["text"]))
events.sort()
for t, who, text in events:
print(f"[{t:7.2f}] {who}: {text}")
Common questions
OBS Studio captures system audio + your mic to a single file. Loses per-speaker separation, so diarization becomes Grok's job (works, just less clean). Browser-based discord clients can also be captured by browser extensions like "Voice Recorder" — same caveats.
Two-party-consent jurisdictions (most of EU, US states like California, Florida, Pennsylvania, …): all participants must consent. One-party-consent: only the recorder needs to know. Best practice always: announce the recording when Craig joins. ScribeForge does not record — only transcribes whatever audio you provide.
Discord ships voice over Opus — generally clean. On per-speaker tracks (Craig's multi-FLAC) accuracy is in the 5–8% WER range — close to studio quality. Mixed multi-speaker tracks degrade somewhat depending on overlap.
No — sent to xAI's Grok STT API for processing, deleted immediately. The transcript exists only in your browser session.
Drop a Discord call recording — get a clean transcript.
Transcribe Discord call free →No account · No credit card · 2 free uses/day per IP