By the RunAIatHome editorial team. Practical local AI setup notes based on current home-lab workflows.
How to Run DeepSeek Locally: Complete Guide (2025)
Step-by-step instructions to run DeepSeek R1 on your own hardware — with exact VRAM numbers for every model variant.
1. What Is DeepSeek?
DeepSeek R1 is a reasoning-focused large language model released by DeepSeek AI in early 2025. It uses chain-of-thought reasoning — showing its work before giving a final answer — which makes it particularly strong for coding, math, and logic tasks.
Running it locally gives you full privacy (no data sent to external servers), zero API costs, and offline access. Your documents and code stay on your machine.
One important clarification upfront: the full DeepSeek R1 model (671B parameters) is designed for data centers and requires hundreds of gigabytes of VRAM. For home use, the distill versions — 8B, 14B, and 32B — are the practical choice. They retain most of R1's reasoning capability at a fraction of the hardware requirements.
2. Which DeepSeek Model Can You Run?
VRAM is the limiting factor. Here are the exact requirements for each DeepSeek variant at Q4 quantization — the standard balance of quality and size:
| Model | Params | Q4 VRAM | Min GPU |
|---|---|---|---|
| DeepSeek R1 Distill 8B | 8B | ~4.8 GB | RTX 3060, GTX 1080 Ti |
| DeepSeek R1 Distill 14B | 14B | ~8.4 GB | RTX 3060 12GB, RTX 4060 |
| DeepSeek R1 Distill 32B | 32B | ~19.2 GB | RTX 4090, RTX 3090 |
| DeepSeek R1 Full | 671B MoE | ~403 GB | Multi-GPU data center |
Correction for common guides
Most guides say DeepSeek R1 needs "200 GB at Q4." That's wrong — it requires ~403 GB at Q4 quantization. The 200 GB figure refers to Q2, which degrades output quality significantly and is not suitable for serious tasks. For home use, stick to the Distill models.
Not sure what your GPU can handle? Use our VRAM Calculator to check your exact GPU →
3. Requirements
GPU (VRAM)
8 GB VRAM: Runs Distill 8B at Q4 (4.8 GB). Minimum viable setup. RTX 3070, RTX 4060.
12 GB VRAM: Runs Distill 14B at Q4 (8.4 GB). Sweet spot for quality-to-VRAM ratio. RTX 3060 12GB, RTX 4060.
24 GB VRAM: Runs Distill 32B at Q4 (~19 GB). Best local reasoning quality. RTX 3090, RTX 4090.
System RAM and Storage
| Model | Min RAM | Disk Space |
|---|---|---|
| Distill 8B | 16 GB | ~5 GB |
| Distill 14B | 16 GB | ~10 GB |
| Distill 32B | 32 GB | ~20 GB |
4. Step-by-Step: Run DeepSeek with Ollama
- 1
Check Your VRAM
Before downloading anything, verify you have enough VRAM for your target model. On Windows, open Task Manager → Performance → GPU. On Linux/macOS with NVIDIA:
nvidia-smiUse the VRAM Calculator to confirm which DeepSeek model your GPU can handle.
- 2
Install Ollama
Download from ollama.com/download for macOS or Windows. On Linux, run:
curl -fsSL https://ollama.com/install.sh | shSee the Complete Ollama Guide for detailed installation steps per OS.
- 3
Pull DeepSeek
Download the model that fits your VRAM:
8B (recommended for most setups):
ollama pull deepseek-r1:8b14B (12GB+ VRAM):
ollama pull deepseek-r1:14b32B (24GB+ VRAM):
ollama pull deepseek-r1:32b - 4
Run
Start an interactive chat session:
ollama run deepseek-r1:8bThe
<think>block is normal. DeepSeek R1 shows its chain-of-thought reasoning before giving the final answer. This can take 30–60 seconds on complex questions. It is not an error or a hang. - 5
Use via API (optional, for developers)
Ollama exposes a REST API on port 11434. Query it from any script or app:
curl http://localhost:11434/api/generate -d '{ "model": "deepseek-r1:8b", "prompt": "Explain recursion with a simple example", "stream": false }'
5. Common Issues
"Model is slow"
Check your VRAM usage with nvidia-smi. If the model doesn't fit entirely in VRAM, it spills to RAM (CPU inference), which is 10–20x slower. Either switch to a smaller model or add more VRAM.
"Out of memory" error
Your GPU doesn't have enough VRAM for the selected model. Switch to a smaller variant: if 14B fails, try 8B. If 8B still fails, you may need to close other applications consuming VRAM (browsers, games, other models).
"The model keeps thinking forever"
This is expected behavior for reasoning models. DeepSeek R1's chain-of-thought can run 30–60 seconds on complex problems. If it runs for several minutes without output, it may be stuck — try a shorter or more specific prompt.
6. DeepSeek R1 vs Distill: What's the Difference?
The full DeepSeek R1 (671B) uses a Mixture of Experts (MoE) architecture — it activates only a subset of parameters per token, which is why it needs less compute per inference than a dense 671B model, but still requires ~403 GB VRAM to load. That's multi-GPU server territory.
The Distill models (8B, 14B, 32B) are dense models fine-tuned from R1's outputs. They are not MoE — they are smaller, fully-dense networks trained to replicate R1's reasoning behavior using knowledge distillation. The result: they load in a fraction of the VRAM and run fast on consumer GPUs.
The distill models retain 90%+ of R1's reasoning capability for everyday tasks — coding, math, analysis — in a fraction of the VRAM. For home use, they are the right choice, not a compromise.
GPUs recomendadas para correr DeepSeek localmente
DeepSeek R1 8B Distill needs ~8 GB VRAM. 14B needs 16 GB. 32B needs 24 GB.
Precios y disponibilidad pueden cambiar. Enlaces de afiliado.
Entry Tier
8–12 GB VRAMRTX 4060
8 GB VRAMRTX 3060
12 GB VRAMMid Tier
12–16 GB VRAMRTX 4060 Ti 16GB
16 GB VRAMRTX 4070
12 GB VRAMHigh Tier
24 GB VRAMRTX 4090
24 GB VRAMRTX 3090
24 GB VRAMCheck Which DeepSeek Model Fits Your GPU
Enter your GPU model and get exact VRAM headroom for each DeepSeek variant.
VRAM Calculator →