A plain text transcript is useful. A transcript with timestamps is far more useful — you can jump directly to any moment in the audio, create linked table of contents, generate subtitles, and extract quotes with a verifiable source time. ScribeForge returns phrase-level timestamps for every transcription, free.
ScribeForge returns segments — short phrases grouped by natural speech pauses — each with a start and end time in seconds:
Segments are split on natural pauses (silence >0.4 seconds) or sentence-ending punctuation. This keeps the timing granular enough to be useful without fragmenting mid-sentence.
When you're transcribing a journalistic or research interview, timestamps let you find the exact quote later. Instead of scrubbing through audio, search the text for a keyword and jump to that second. Great for fact-checking and attribution.
Use timestamps to create a chapter list for your episode. "00:04:32 — Guest explains their background" or "00:21:07 — Main argument on climate policy." Listeners can jump to the part they care about; this also helps with SEO since chapter text is indexed.
The segment timestamps are close enough to subtitle timecodes that you can adapt them into an SRT file manually or with a script. Each segment becomes one subtitle entry. Exact timing may need minor adjustment, but you're starting from 90% accuracy instead of zero.
For recorded meetings, timestamps help you distinguish who said what and when — especially when combined with the segment text that names the speaker. "At 14:32, the product team confirmed the API deadline."
Depositions, recorded calls, and arbitration recordings need verifiable timestamps for evidentiary purposes. The segment timestamps provide a reference that maps directly back to the original audio file's position.
The web interface shows phrase-level timestamps. The underlying xAI Grok STT API (which ScribeForge uses) also returns word-level timestamps — every single word with its own start and end time. These are available if you call the API directly:
"words": [
{"text": "Good", "start": 0.18, "end": 0.44},
{"text": "morning", "start": 0.44, "end": 0.81},
{"text": "everyone", "start": 0.81, "end": 1.40},
...
]
Word-level timestamps are useful for karaoke-style highlighting, precise subtitle sync, or when you need to cut the audio at exact word boundaries. See the developer guide for full API documentation.
The timestamps come directly from xAI's scribe_v2 model alignment, which is trained on forced-alignment data. In practice, phrase-level timestamps are accurate to within ±0.2 seconds. Word-level timestamps are slightly less reliable on fast speech or heavy accents but still within ±0.5 seconds in most cases.
| Tool | Timestamp level | Free tier | Account required |
|---|---|---|---|
| ScribeForge | Phrase + word (API) | Free preview | No |
| Otter.ai | Word | 600 min/month | Yes |
| Whisper (local) | Word + segment | Unlimited | No (GPU needed) |
| Descript | Word | 1 hr/month | Yes |
| Rev | Word | No | Yes (paid only) |
Transcribe your audio with timestamps — free, no account, results in seconds.
Get Timestamped Transcript →No account · No credit card · Try free