Transcription

Clipthesis can transcribe the spoken audio in your videos locally on your Mac and make the words searchable across your whole library or inside a single clip. No audio ever leaves your machine — transcription runs on your own hardware using the Whisper model.

This is especially useful for interview, vlog, or b-roll footage where you want to find the moment someone said a specific phrase without having to scrub through the timeline.

Getting started

1. Open transcription settings

Go to Settings → Transcription.

2. Pick a model

The first time you transcribe a clip, Clipthesis downloads a Whisper model file (once — it's cached afterwards). Larger models are more accurate but slower:

Model	Size	Best for
`tiny.en`	~75 MB	Very fast, English-only, rough drafts
`base`	~150 MB	Fast, decent quality
`small`	~470 MB	Good balance — default
`medium`	~1.5 GB	High quality, slower
`large-v3`	~3.1 GB	Best quality, noticeably slower

Click Download next to a model to fetch it. You can switch models at any time; existing transcripts aren't re-transcribed automatically (use the ↻ button on a clip's transcript panel to regenerate).

3. Language

Pick Auto-detect to let Whisper figure out the language per clip, or lock to a specific language (e.g. English) if you know all your footage is in the same language. Locking can produce slightly better results on accented or noisy speech.

4. Auto-transcribe on import

Toggle Auto-transcribe new videos to have Clipthesis queue a transcription job for every video added during an import. Leave it off if you only want to transcribe specific clips on demand.

5. Transcribe existing clips

Open any video in the detail modal. The Transcript section in the right sidebar shows one of these states:

Not yet transcribed — click Transcribe now to start.
Transcribing… 42% — a job is in progress. You can cancel from here.
No audio — the clip has no audio track; nothing to transcribe.
Failed — Retry — something went wrong (often a missing model file). Click Retry.

You can also bulk-transcribe everything in your library from Settings → Transcription → Transcribe all untranscribed videos.

Searching transcripts globally

The main search bar at the top of the Library page searches across tags, filenames, and transcripts.

Click the search bar (or press ⌘F).
Start typing. A dropdown offers three options:
- Search tag "…" — filter clips by tag
- Search filename "…" — match filenames
- Search transcripts for "…" — match spoken words
Click Search transcripts for "…" (the one with the subtitles icon).

The grid updates to show only clips whose transcript contains the phrase. Each matching clip's card shows a snippet of the transcript around the match, with the hit highlighted. Click a result to open the clip.

Transcript search uses full-text indexing (SQLite FTS5), so it's fast even across thousands of clips. It's case-insensitive and tolerates diacritics.

Filtering to clips that have speech

In the filter sidebar, toggle Has speech to hide clips without spoken content (including clips with no audio track at all). Combine this with tag, drive, or date filters as usual.

Searching inside a single clip

Once you've opened a clip in the detail modal, the Transcript panel in the right sidebar lists every spoken segment with a timestamp. For long-form clips like interviews, scrolling through hundreds of segments to find a specific quote is slow. Use the in-panel filter:

Click the Filter transcript… input at the top of the transcript panel.
Start typing. The list narrows to segments containing your phrase (case-insensitive), with each match highlighted.
A counter above the list shows K of N matches.
Click any filtered segment to jump playback to that moment.
Clear the filter with the × button or by pressing Escape while the input is focused.

While a filter is active, playback auto-scroll is paused so the active segment doesn't jump around under your cursor.

Exporting a transcript

From the transcript panel toolbar:

.srt — standard subtitle file with timecodes, usable in any NLE
.txt — plain-text transcript with timestamps
Copy — copy the full transcript to the clipboard as prose (no timestamps)

Privacy & offline

Transcription is 100% local. The audio is piped to a bundled Whisper model inside the app and never leaves your machine. Models are downloaded directly from Hugging Face the first time you need them; after that everything runs offline.

Troubleshooting

"The model isn't downloaded yet" — go to Settings → Transcription and click Download next to the model you've selected.

A transcription job is stuck "pending" — quit and relaunch Clipthesis. On startup, any jobs that were running when the app closed are reset to pending and picked up by the queue again.

The transcript text looks wrong or garbled — try a larger model (small → medium). If your clip has heavy background music or multiple overlapping voices, Whisper will struggle — Clipthesis doesn't currently do speaker diarization.

Searching for a phrase I know is there returns nothing — make sure the clip has finished transcribing (the transcript panel will show the segments once it's done) and that the model matches the clip's language.

Transcription ​

Getting started ​

1. Open transcription settings ​

2. Pick a model ​

3. Language ​

4. Auto-transcribe on import ​

5. Transcribe existing clips ​

Searching transcripts globally ​

Filtering to clips that have speech ​

Searching inside a single clip ​

Exporting a transcript ​

Privacy & offline ​

Troubleshooting ​