Skip to main content

What GPU do I need for local AI?

Answer 5 questions and we'll tell you exactly what to buy

Javier Morales Especialista en Hardware e IA Local — 8 años de experiencia
GitHub: github.com/javier-morales-ia
Question 1 of 50%

What do you want to use local AI for?

How to pick the right GPU for local AI

Choosing the ideal GPU for local AI is not a single decision: it depends on the kind of task you want to do, the model sizes you plan to run, your operating system, and of course your available budget. This quiz analyses all those factors and gives you a concrete recommendation based on real performance data.

Unlike generic rankings that just surface "the best GPU", this quiz differentiates between specific use cases. Running Whisper to transcribe podcasts (you only need 3 GB of VRAM) is nothing like running Llama 3.1 70B for advanced research (you need 40+ GB or CPU offloading). The recommended GPU varies dramatically based on what you want to do.

Key factors the quiz weighs

  • Task type: Chat and text assistants, code generation, image creation (Stable Diffusion, Flux), audio transcription (Whisper) or research and fine-tuning. Each task has different VRAM and compute requirements.
  • Target model size: Whether your goal is to run 7B models for daily use or to experiment with 30B–70B models for research tasks. This determines the minimum VRAM you need.
  • Operating system: NVIDIA GPUs work on Windows, Linux, and macOS (with limitations). AMD has better Linux support via ROCm. Apple M-series Macs offer unified memory and are the natural choice if you already own a Mac.
  • Budget: From entry-level options (RTX 3060 12 GB, ~$300) to high-end (RTX 4090 24 GB, ~$1,800). The quiz tailors the recommendation to your investment range.

Quick recommendations by user profile

While you take the quiz, here's a quick profile-based guide for context:

  • Beginner: If this is your first foray into local AI and you don't want to invest much, the RTX 3060 12 GB (new or second-hand) is the ideal entry point. It runs 7B models comfortably and has enough VRAM to experiment with 13B models at Q4.
  • Active developer: If you use AI for coding, code review, and text generation daily, the RTX 4060 Ti 16 GB offers the best balance of price and capability. 16 GB of VRAM comfortably runs 14B models with very efficient power draw.
  • Advanced enthusiast: If you want to run 30B–70B models for research or as a home server, the RTX 4090 24 GB is the most powerful consumer option. It's also ideal for Flux.1 Dev and high-quality image generation.
  • Mac user: Apple M4 chips with 32–128 GB of unified memory are excellent for LLMs. There's no separate VRAM: all system memory is shared with the GPU. The M4 Pro with 48 GB is the sweet spot for 13B–30B models.

Common mistakes when buying a GPU for local AI

The quiz gives you a data-driven recommendation, but it helps to know the most common mistakes so you avoid them even if your initial recommendation differs:

  • Prioritising CUDA cores over VRAM: An RTX 4070 Super with 12 GB may look superior to an RTX 4060 Ti with 16 GB in gaming benchmarks, but for local AI VRAM is the limiting factor. A model that doesn't fit in VRAM won't run, regardless of compute power.
  • Buying the most expensive GPU without defining the goal: To run Llama 3.1 8B for daily use with Ollama, an RTX 3060 12 GB is plenty and costs three times less than an RTX 4090. Define what models you want to run first, then pick the minimum hardware needed.
  • Ignoring power draw: An RTX 4090 draws up to 450 W under full load. If you use it as an inference server 8 hours a day, the annual electricity cost can exceed $150. The RTX 40 series is noticeably more efficient for inference than older generations.
  • Underestimating the second-hand market: A second-hand RTX 3090 with 24 GB can cost $400–$600 and beats a new RTX 4070 on VRAM capacity. For local AI, older high-end models remain excellent value picks.