🎙️ Workflow Template

Local Audio Transcription

Transcribe audio and video files locally with Whisper — private, fast, and free with no API costs.

Setup time

10–20 minutes

Min hardware

4 GB VRAM (GTX 1660 / RX 580) or CPU-only

Software

Whisper.cpp or faster-whisper

Recommended model

Whisper Large v3

Install faster-whisper

Run: pip install faster-whisper. This is a faster C++ reimplementation of OpenAI Whisper.

Tip:

Works on CPU if you don't have a GPU — just slower

Download the model

Models are downloaded automatically on first use. For CPU: use tiny or base. For GPU with 4+ GB VRAM: use large-v3.

Transcribe your first file

Run this Python script:

from faster_whisper import WhisperModel
model = WhisperModel("large-v3", device="cuda")
segments, info = model.transcribe("audio.mp3")
for seg in segments:
    print(f"[{seg.start:.1f}s] {seg.text}")

Batch process video files

For video files, extract audio with FFmpeg first: ffmpeg -i video.mp4 -vn audio.mp3, then transcribe.

Tip:

faster-whisper can process 1 hour of audio in ~2 minutes on RTX 3060

Export subtitles (SRT)

Add word_timestamps=True to get word-level timing for SRT export. Many tools like Subtitle Edit can import the output.

Affiliate disclosure: links to Amazon may earn us a commission at no extra cost to you.

Compatible GPUs for this workflow

For smooth local AI inference with this workflow, these NVIDIA RTX GPUs deliver the best experience.

Entry

RTX 3060

12 GB VRAM

View specs Check availability

Balanced

RTX 4060

8 GB VRAM

View specs Check availability

High-end

RTX 4070

12 GB VRAM

View specs Check availability

Prices and availability may change.

Find your ideal AI model → Check GPU compatibility →