Does Discord record voice calls natively?

No. Discord has no built-in recording feature for voice channels. The standard approach is to invite a recording bot — Craig (craig.chat) is the most widely used. It joins as a participant, captures multi-track audio, and DMs you a download link when the call ends.

Is recording a Discord call legal?

Depends on jurisdiction. In two-party-consent regions (e.g. several US states, much of the EU), all participants must consent. In one-party-consent regions, the recording host's consent is enough. Always announce the recording explicitly when Craig joins. ScribeForge has no opinion on the recording itself — it transcribes whatever you upload.

Why use multiple FLAC tracks instead of a single mixed file?

Each track contains only one speaker's voice, which means perfect speaker labeling without relying on automatic diarization. Transcribe each .flac, label by speaker name (filename includes the Discord username), and merge by timestamp. Cleaner than letting Grok diarize a busy multi-speaker mix.

What about the 25 MB upload cap?

FLAC is lossless and large. For a 1-hour call, each per-speaker FLAC is typically 30-100 MB. Convert to OGG Opus first to compress 5-10×: 'ffmpeg -i speaker.flac -c:a libopus -b:a 32k speaker.ogg'. Opus at 32 kbps mono is plenty for transcription accuracy.

Can I record without Craig?

OBS Studio captures system audio plus your mic to a single file — works on Windows, macOS, Linux. Per-speaker separation is lost (single mixed track), so diarization quality degrades. Craig with multi-track FLAC is the cleaner option for transcription.

Use case · Updated April 29, 2026

How to Transcribe a Discord Call — Free

Discord doesn't record voice channels natively. Here's the cleanest workflow: Craig bot for recording, ScribeForge for transcription. End-to-end, free.

Yes — any Discord voice call can be transcribed for free, but Discord itself doesn't record. The standard combo is Craig (free recording bot) → ScribeForge (transcription). Craig captures multi-track audio (one file per speaker), and ScribeForge transcribes each track in 10–30 seconds. Result: a perfectly labeled transcript with no diarization guesswork.

Heads up — recording laws vary by jurisdiction. In two-party-consent regions (most of the EU, several US states), every participant must agree before recording. Announce the recording when Craig joins.

Why Craig + ScribeForge beats single-file recording

The naive approach is to use OBS Studio or system audio recording — gives you one mixed audio file with all speakers overlapping. Grok STT can diarize this (Speaker 1, Speaker 2…) but accuracy drops on noisy multi-speaker mixes.

Craig records each speaker on a separate track. You get one FLAC per Discord username. Transcribe each FLAC individually → automatic per-speaker labeling, no diarization needed. Concatenate by timestamp at the end.

The 5-step workflow

Add Craig — visit craig.chat and invite the bot to your server. Admin permissions required. Free tier is enough for casual calls.
Start recording — in your voice channel, type :craig:, join. Craig joins as a participant and starts capturing. Announce the recording to all participants.
End recording — type :craig:, leave or end the call. Craig DMs the recording-host a download link.
Pick "Multiple FLAC tracks" — one file per speaker, lossless. Filename includes the Discord username, so you already know who is who.
Transcribe each track — drop each .flac into ScribeForge. Result is a clean per-speaker transcript. Combine by timestamp at the end.

Compress FLAC before upload (25 MB cap)

FLAC is lossless and large. A 1-hour call produces 30–100 MB per speaker — over the 25 MB cap. Convert to OGG Opus at 32 kbps mono first:

ffmpeg -i speaker.flac -c:a libopus -b:a 32k -ac 1 speaker.ogg

32 kbps Opus mono is plenty for transcription accuracy — the human voice fits in that bitrate cleanly. Result is 5–10× smaller, fits well under the cap.

Combining per-speaker transcripts

Each ScribeForge transcript includes phrase-level timestamps (start, end, text). For a unified call transcript:

Save each transcript as speaker-{username}.json (download includes timestamps).
Merge into a single timeline by start-time, prefixing each line with the speaker name.
One Python snippet does this:

import json, glob

events = []
for fn in glob.glob("speaker-*.json"):
    speaker = fn.replace("speaker-", "").replace(".json", "")
    for seg in json.load(open(fn))["segments"]:
        events.append((seg["start"], speaker, seg["text"]))

events.sort()
for t, who, text in events:
    print(f"[{t:7.2f}] {who}: {text}")

Common questions

Are there alternatives to Craig?

OBS Studio captures system audio + your mic to a single file. Loses per-speaker separation, so diarization becomes Grok's job (works, just less clean). Browser-based discord clients can also be captured by browser extensions like "Voice Recorder" — same caveats.

What's the legal answer on consent?

Two-party-consent jurisdictions (most of EU, US states like California, Florida, Pennsylvania, …): all participants must consent. One-party-consent: only the recorder needs to know. Best practice always: announce the recording when Craig joins. ScribeForge does not record — only transcribes whatever audio you provide.

How accurate is Grok on Discord audio?

Discord ships voice over Opus — generally clean. On per-speaker tracks (Craig's multi-FLAC) accuracy is in the 5–8% WER range — close to studio quality. Mixed multi-speaker tracks degrade somewhat depending on overlap.

Is the audio stored on ScribeForge?

No — sent to xAI's Grok STT API for processing, deleted immediately. The transcript exists only in your browser session.

Drop a Discord call recording — get a clean transcript.

Transcribe Discord call free →

No account · No credit card · 2 free uses/day per IP