Skip to main content
Intermediate 18 min read ·

By the RunAIatHome editorial team. Practical local AI setup notes based on current home-lab workflows.

How to Run DeepSeek Locally: Complete Guide (2025)

Step-by-step instructions to run DeepSeek R1 on your own hardware — with exact VRAM numbers for every model variant.

1. What Is DeepSeek?

DeepSeek R1 is a reasoning-focused large language model released by DeepSeek AI in early 2025. It uses chain-of-thought reasoning — showing its work before giving a final answer — which makes it particularly strong for coding, math, and logic tasks.

Running it locally gives you full privacy (no data sent to external servers), zero API costs, and offline access. Your documents and code stay on your machine.

One important clarification upfront: the full DeepSeek R1 model (671B parameters) is designed for data centers and requires hundreds of gigabytes of VRAM. For home use, the distill versions — 8B, 14B, and 32B — are the practical choice. They retain most of R1's reasoning capability at a fraction of the hardware requirements.

2. Which DeepSeek Model Can You Run?

VRAM is the limiting factor. Here are the exact requirements for each DeepSeek variant at Q4 quantization — the standard balance of quality and size:

Model Params Q4 VRAM Min GPU
DeepSeek R1 Distill 8B 8B ~4.8 GB RTX 3060, GTX 1080 Ti
DeepSeek R1 Distill 14B 14B ~8.4 GB RTX 3060 12GB, RTX 4060
DeepSeek R1 Distill 32B 32B ~19.2 GB RTX 4090, RTX 3090
DeepSeek R1 Full 671B MoE ~403 GB Multi-GPU data center

Correction for common guides

Most guides say DeepSeek R1 needs "200 GB at Q4." That's wrong — it requires ~403 GB at Q4 quantization. The 200 GB figure refers to Q2, which degrades output quality significantly and is not suitable for serious tasks. For home use, stick to the Distill models.

Not sure what your GPU can handle? Use our VRAM Calculator to check your exact GPU →

3. Requirements

GPU (VRAM)

8 GB VRAM: Runs Distill 8B at Q4 (4.8 GB). Minimum viable setup. RTX 3070, RTX 4060.

12 GB VRAM: Runs Distill 14B at Q4 (8.4 GB). Sweet spot for quality-to-VRAM ratio. RTX 3060 12GB, RTX 4060.

24 GB VRAM: Runs Distill 32B at Q4 (~19 GB). Best local reasoning quality. RTX 3090, RTX 4090.

System RAM and Storage

Model Min RAM Disk Space
Distill 8B 16 GB ~5 GB
Distill 14B 16 GB ~10 GB
Distill 32B 32 GB ~20 GB

4. Step-by-Step: Run DeepSeek with Ollama

  1. 1

    Check Your VRAM

    Before downloading anything, verify you have enough VRAM for your target model. On Windows, open Task Manager → Performance → GPU. On Linux/macOS with NVIDIA:

    nvidia-smi

    Use the VRAM Calculator to confirm which DeepSeek model your GPU can handle.

  2. 2

    Install Ollama

    Download from ollama.com/download for macOS or Windows. On Linux, run:

    curl -fsSL https://ollama.com/install.sh | sh

    See the Complete Ollama Guide for detailed installation steps per OS.

  3. 3

    Pull DeepSeek

    Download the model that fits your VRAM:

    8B (recommended for most setups):

    ollama pull deepseek-r1:8b

    14B (12GB+ VRAM):

    ollama pull deepseek-r1:14b

    32B (24GB+ VRAM):

    ollama pull deepseek-r1:32b
  4. 4

    Run

    Start an interactive chat session:

    ollama run deepseek-r1:8b

    The <think> block is normal. DeepSeek R1 shows its chain-of-thought reasoning before giving the final answer. This can take 30–60 seconds on complex questions. It is not an error or a hang.

  5. 5

    Use via API (optional, for developers)

    Ollama exposes a REST API on port 11434. Query it from any script or app:

    curl http://localhost:11434/api/generate -d '{ "model": "deepseek-r1:8b", "prompt": "Explain recursion with a simple example", "stream": false }'

5. Common Issues

"Model is slow"

Check your VRAM usage with nvidia-smi. If the model doesn't fit entirely in VRAM, it spills to RAM (CPU inference), which is 10–20x slower. Either switch to a smaller model or add more VRAM.

"Out of memory" error

Your GPU doesn't have enough VRAM for the selected model. Switch to a smaller variant: if 14B fails, try 8B. If 8B still fails, you may need to close other applications consuming VRAM (browsers, games, other models).

"The model keeps thinking forever"

This is expected behavior for reasoning models. DeepSeek R1's chain-of-thought can run 30–60 seconds on complex problems. If it runs for several minutes without output, it may be stuck — try a shorter or more specific prompt.

6. DeepSeek R1 vs Distill: What's the Difference?

The full DeepSeek R1 (671B) uses a Mixture of Experts (MoE) architecture — it activates only a subset of parameters per token, which is why it needs less compute per inference than a dense 671B model, but still requires ~403 GB VRAM to load. That's multi-GPU server territory.

The Distill models (8B, 14B, 32B) are dense models fine-tuned from R1's outputs. They are not MoE — they are smaller, fully-dense networks trained to replicate R1's reasoning behavior using knowledge distillation. The result: they load in a fraction of the VRAM and run fast on consumer GPUs.

The distill models retain 90%+ of R1's reasoning capability for everyday tasks — coding, math, analysis — in a fraction of the VRAM. For home use, they are the right choice, not a compromise.

GPUs recomendadas para correr DeepSeek localmente

DeepSeek R1 8B Distill needs ~8 GB VRAM. 14B needs 16 GB. 32B needs 24 GB.

Precios y disponibilidad pueden cambiar. Enlaces de afiliado.

Entry Tier

8–12 GB VRAM

RTX 4060

8 GB VRAM
Ver disponibilidad →

RTX 3060

12 GB VRAM
Ver disponibilidad →

Mid Tier

12–16 GB VRAM

RTX 4060 Ti 16GB

16 GB VRAM
Ver disponibilidad →

RTX 4070

12 GB VRAM
Ver disponibilidad →

High Tier

24 GB VRAM

RTX 4090

24 GB VRAM
Ver disponibilidad →

RTX 3090

24 GB VRAM
Ver disponibilidad →

Check Which DeepSeek Model Fits Your GPU

Enter your GPU model and get exact VRAM headroom for each DeepSeek variant.

VRAM Calculator →