Local Audio Transcription
Transcribe audio and video files locally with Whisper — private, fast, and free with no API costs.
Setup time
10–20 minutes
Min hardware
4 GB VRAM (GTX 1660 / RX 580) or CPU-only
Software
Whisper.cpp or faster-whisper
Recommended model
Whisper Large v3
Install faster-whisper
Run: pip install faster-whisper. This is a faster C++ reimplementation of OpenAI Whisper.
Works on CPU if you don't have a GPU — just slower
Download the model
Models are downloaded automatically on first use. For CPU: use tiny or base. For GPU with 4+ GB VRAM: use large-v3.
Transcribe your first file
Run this Python script:
from faster_whisper import WhisperModel
model = WhisperModel("large-v3", device="cuda")
segments, info = model.transcribe("audio.mp3")
for seg in segments:
print(f"[{seg.start:.1f}s] {seg.text}") Batch process video files
For video files, extract audio with FFmpeg first: ffmpeg -i video.mp4 -vn audio.mp3, then transcribe.
faster-whisper can process 1 hour of audio in ~2 minutes on RTX 3060
Export subtitles (SRT)
Add word_timestamps=True to get word-level timing for SRT export. Many tools like Subtitle Edit can import the output.
Affiliate disclosure: links to Amazon may earn us a commission at no extra cost to you.
Compatible GPUs for this workflow
For smooth local AI inference with this workflow, these NVIDIA RTX GPUs deliver the best experience.
Prices and availability may change.