Skip to main content
🎙️ Workflow Template

Local Audio Transcription

Transcribe audio and video files locally with Whisper — private, fast, and free with no API costs.

Setup time

10–20 minutes

Min hardware

4 GB VRAM (GTX 1660 / RX 580) or CPU-only

Software

Whisper.cpp or faster-whisper

Recommended model

Whisper Large v3

1

Install faster-whisper

Run: pip install faster-whisper. This is a faster C++ reimplementation of OpenAI Whisper.

Tip:

Works on CPU if you don't have a GPU — just slower

2

Download the model

Models are downloaded automatically on first use. For CPU: use tiny or base. For GPU with 4+ GB VRAM: use large-v3.

3

Transcribe your first file

Run this Python script:

from faster_whisper import WhisperModel
model = WhisperModel("large-v3", device="cuda")
segments, info = model.transcribe("audio.mp3")
for seg in segments:
    print(f"[{seg.start:.1f}s] {seg.text}")

4

Batch process video files

For video files, extract audio with FFmpeg first: ffmpeg -i video.mp4 -vn audio.mp3, then transcribe.

Tip:

faster-whisper can process 1 hour of audio in ~2 minutes on RTX 3060

5

Export subtitles (SRT)

Add word_timestamps=True to get word-level timing for SRT export. Many tools like Subtitle Edit can import the output.

Affiliate disclosure: links to Amazon may earn us a commission at no extra cost to you.

Compatible GPUs for this workflow

For smooth local AI inference with this workflow, these NVIDIA RTX GPUs deliver the best experience.

Prices and availability may change.