Transcription
Clipthesis can transcribe the spoken audio in your videos locally on your Mac and make the words searchable across your whole library or inside a single clip. No audio ever leaves your machine — transcription runs on your own hardware using the Whisper model.
This is especially useful for interview, vlog, or b-roll footage where you want to find the moment someone said a specific phrase without having to scrub through the timeline.
Getting started
1. Open transcription settings
Go to Settings → Transcription.
2. Pick a model
The first time you transcribe a clip, Clipthesis downloads a Whisper model file (once — it's cached afterwards). Larger models are more accurate but slower:
| Model | Size | Best for |
|---|---|---|
tiny.en | ~75 MB | Very fast, English-only, rough drafts |
base | ~150 MB | Fast, decent quality |
small | ~470 MB | Good balance — default |
medium | ~1.5 GB | High quality, slower |
large-v3 | ~3.1 GB | Best quality, noticeably slower |
Click Download next to a model to fetch it. You can switch models at any time; existing transcripts aren't re-transcribed automatically (use the ↻ button on a clip's transcript panel to regenerate).
3. Language
Pick Auto-detect to let Whisper figure out the language per clip, or lock to a specific language (e.g. English) if you know all your footage is in the same language. Locking can produce slightly better results on accented or noisy speech.
4. Auto-transcribe on import
Toggle Auto-transcribe new videos to have Clipthesis queue a transcription job for every video added during an import. Leave it off if you only want to transcribe specific clips on demand.
5. Transcribe existing clips
Open any video in the detail modal. The Transcript section in the right sidebar shows one of these states:
- Not yet transcribed — click Transcribe now to start.
- Transcribing… 42% — a job is in progress. You can cancel from here.
- No audio — the clip has no audio track; nothing to transcribe.
- Failed — Retry — something went wrong (often a missing model file). Click Retry.
You can also bulk-transcribe everything in your library from Settings → Transcription → Transcribe all untranscribed videos.
Searching transcripts globally
The main search bar at the top of the Library page searches across tags, filenames, and transcripts.
- Click the search bar (or press
⌘F). - Start typing. A dropdown offers three options:
- Search tag "…" — filter clips by tag
- Search filename "…" — match filenames
- Search transcripts for "…" — match spoken words
- Click Search transcripts for "…" (the one with the subtitles icon).
The grid updates to show only clips whose transcript contains the phrase. Each matching clip's card shows a snippet of the transcript around the match, with the hit highlighted. Click a result to open the clip.
Transcript search uses full-text indexing (SQLite FTS5), so it's fast even across thousands of clips. It's case-insensitive and tolerates diacritics.
Filtering to clips that have speech
In the filter sidebar, toggle Has speech to hide clips without spoken content (including clips with no audio track at all). Combine this with tag, drive, or date filters as usual.
Searching inside a single clip
Once you've opened a clip in the detail modal, the Transcript panel in the right sidebar lists every spoken segment with a timestamp. For long-form clips like interviews, scrolling through hundreds of segments to find a specific quote is slow. Use the in-panel filter:
- Click the Filter transcript… input at the top of the transcript panel.
- Start typing. The list narrows to segments containing your phrase (case-insensitive), with each match highlighted.
- A counter above the list shows K of N matches.
- Click any filtered segment to jump playback to that moment.
- Clear the filter with the × button or by pressing
Escapewhile the input is focused.
While a filter is active, playback auto-scroll is paused so the active segment doesn't jump around under your cursor.
Exporting a transcript
From the transcript panel toolbar:
- .srt — standard subtitle file with timecodes, usable in any NLE
- .txt — plain-text transcript with timestamps
- Copy — copy the full transcript to the clipboard as prose (no timestamps)
Privacy & offline
Transcription is 100% local. The audio is piped to a bundled Whisper model inside the app and never leaves your machine. Models are downloaded directly from Hugging Face the first time you need them; after that everything runs offline.
Troubleshooting
"The model isn't downloaded yet" — go to Settings → Transcription and click Download next to the model you've selected.
A transcription job is stuck "pending" — quit and relaunch Clipthesis. On startup, any jobs that were running when the app closed are reset to pending and picked up by the queue again.
The transcript text looks wrong or garbled — try a larger model (small → medium). If your clip has heavy background music or multiple overlapping voices, Whisper will struggle — Clipthesis doesn't currently do speaker diarization.
Searching for a phrase I know is there returns nothing — make sure the clip has finished transcribing (the transcript panel will show the segments once it's done) and that the model matches the clip's language.