By the RunAIatHome editorial team. Ranking based on VRAM capacity, measured throughput, thermals, and software compatibility for local AI.
The 8 Best GPUs for Running AI Locally (2025)
Real benchmarks and realistic market bands. No marketing fluff — just the numbers that matter for running LLMs and Stable Diffusion on your own hardware.
Nuestra recomendación
Best overall: RTX 4090. Best value: RTX 3090 usada.
Si buscas la referencia de rendimiento puro para IA local, la RTX 4090 sigue arriba. Si lo que quieres es maximizar VRAM por euro de mercado y aceptar más consumo, la RTX 3090 usada sigue siendo la compra más racional para la mayoría de entusiastas.
- Ganadora absoluta: RTX 4090 por ancho de banda, software y throughput sostenido.
- Ganadora valor: RTX 3090 usada por sus 24 GB y compatibilidad amplia.
- Compra alternativa: RX 7900 XTX si ya trabajas cómodo en Linux y ROCm.
1. What Actually Matters for AI GPUs
Before diving into the list, a quick primer on the three specs that matter for local AI. These are not gaming benchmarks — AI workloads stress completely different parts of the GPU.
VRAM (Capacity)
Determines the largest model you can run. A 70B Q4 model needs ~36 GB. If you have 24 GB, you cannot run it fully on-GPU. This is a hard limit — no software trick can fix insufficient VRAM.
Memory Bandwidth
Determines tokens per second. LLM inference is memory-bandwidth-bound, not compute-bound. The RTX 4090 (1008 GB/s) generates tokens ~3x faster than the RTX 4060 (288 GB/s) on the same model.
Power (TDP)
Affects electricity cost and heat. An RTX 4090 at 450W running for long daily sessions adds a visible monthly power cost. Relevant if you run models continuously or live in a hot climate.
Want to check if a specific model fits your GPU? Use our VRAM Calculator — enter your GPU model and the AI model you want to run, and get exact VRAM headroom.
2. Quick Comparison Table
| GPU | VRAM | BW (GB/s) | 70B Q4 (tok/s) | SDXL (s/img) | TDP | Market band |
|---|---|---|---|---|---|---|
| RTX 4090 | 24 GB | 1008 | ~45 | ~2.5 | 450W | Flagship |
| RTX 4080 Super | 16 GB | 736 | N/A | ~3.5 | 320W | High-end |
| RTX 4070 Ti Super | 16 GB | 672 | N/A | ~4.0 | 285W | Upper mid-range |
| RTX 3090 (used) | 24 GB | 936 | ~32 | ~4.0 | 350W | Used high-end value |
| RX 7900 XTX | 24 GB | 960 | ~28 | ~5.0 | 355W | High-end value |
| RTX A6000 | 48 GB | 768 | ~35 | ~4.5 | 300W | Workstation premium |
| M4 Max (128 GB) | 128 GB* | 546 | ~18 | ~8.0 | ~40W | $3,500+ |
| Arc A770 16 GB | 16 GB | 560 | N/A | ~7.0 | 225W | Budget wildcard |
* Unified memory (shared between CPU and GPU). Benchmarks measured with llama.cpp / Ollama, Stable Diffusion via ComfyUI / A1111. "N/A" means the GPU lacks sufficient VRAM to run the model entirely on-GPU.
Precios de referencia en euros (2026)
Los precios de lista en dólares rara vez reflejan lo que pagas en Europa. La tabla siguiente recoge precios aproximados en euros para cada GPU analizada, con su VRAM, velocidad real de inferencia con Llama 3.1 8B Q4 y el caso de uso más recomendado. Los precios de segunda mano pueden variar un 15-20 % según la época del año.
| GPU | VRAM | Precio aprox (€) | Velocidad tok/s | Caso de uso |
|---|---|---|---|---|
| RTX 4090 | 24 GB | ~1.999 € | ~90 tok/s | Producción / modelos grandes |
| RTX 4080 Super | 16 GB | ~899 € | ~65 tok/s | Desarrollo avanzado |
| RTX 4070 Ti Super | 16 GB | ~699 € | ~55 tok/s | Equilibrio precio/rendimiento |
| RTX 3090 (segunda mano) | 24 GB | ~799 € | ~70 tok/s | Alternativa económica 24 GB |
| RTX 4070 Super | 12 GB | ~549 € | ~45 tok/s | Gaming + IA local |
| RTX 4060 Ti 16 GB | 16 GB | ~399 € | ~35 tok/s | Presupuesto medio |
| RTX 3060 12 GB | 12 GB | ~269 € | ~30 tok/s | Entrada económica |
Precios orientativos en euros para el mercado europeo (abril 2026). Velocidades medidas con Llama 3.1 8B Q4_K_M en Ollama. Los precios de segunda mano corresponden a plataformas como Wallapop, eBay.es o Back Market.
3. Detailed Reviews
1. NVIDIA RTX 4090
Best OverallVRAM
24 GB GDDR6X
Bandwidth
1008 GB/s
TDP
450W
Price
Flagship
The undisputed king of consumer AI. Runs Llama 3 70B Q4 entirely in VRAM with room to spare for context. Stable Diffusion at absurd speeds. The only downside: price and 450W TDP.
2. NVIDIA RTX 4080 Super
VRAM
16 GB GDDR6X
Bandwidth
736 GB/s
TDP
320W
Price
High-end
Excellent for models up to 13B at high quantization or 34B Q4. Not enough VRAM for 70B. Great Stable Diffusion performance. Good balance if the 4090 is out of budget.
3. NVIDIA RTX 4070 Ti Super
Best Value Mid-HighVRAM
16 GB GDDR6X
Bandwidth
672 GB/s
TDP
285W
Price
Upper mid-range
The 16 GB sweet spot at a reasonable price. Runs 13B models at Q8 or even FP16, Stable Diffusion XL comfortably. Better value than the 4080 Super for most local AI use cases.
4. NVIDIA RTX 3090
Best Value OverallVRAM
24 GB GDDR6X
Bandwidth
936 GB/s
TDP
350W
Price
Used high-end value
The best value proposition in AI GPUs right now. 24 GB VRAM — same as the RTX 4090 — at 40% of the price on the used market. Slower per-token than the 4090, but you get the same model capacity. Power hungry at 350W.
5. AMD RX 7900 XTX
Best AMDVRAM
24 GB GDDR6
Bandwidth
960 GB/s
TDP
355W
Price
High-end value
24 GB VRAM at a lower price than NVIDIA equivalents. The catch: ROCm only works well on Linux, and some AI tools lack AMD optimization. If you run Linux and are comfortable troubleshooting, this is excellent value.
6. NVIDIA RTX A6000
Most VRAM (Single Card)VRAM
48 GB GDDR6
Bandwidth
768 GB/s
TDP
300W
Price
Workstation
48 GB VRAM in a single card — the only way to run 70B models at Q8 without multi-GPU. No gaming features (no display outputs on some SKUs). Designed for data centers but increasingly popular with AI enthusiasts buying used.
7. Apple M4 Max (128 GB)
Best for Large ModelsVRAM
128 GB unified
Bandwidth
546 GB/s
TDP
~40W (chip)
Price
Apple premium
The unified memory architecture means system RAM = GPU VRAM. 128 GB lets you run models that no discrete GPU can touch — including 70B at full FP16 precision. Token speed is lower due to bandwidth, but completely silent and uses ~40W. Unique in the market.
8. Intel Arc A770 16 GB
Budget WildcardVRAM
16 GB GDDR6
Bandwidth
560 GB/s
TDP
225W
Price
Budget wildcard
The dark horse. 16 GB VRAM in a budget band looks excellent on paper. The reality: Intel AI software (IPEX-LLM, SYCL) is maturing but still behind CUDA and ROCm. If you enjoy tinkering and want cheap VRAM, it is worth watching. Not recommended as a primary AI GPU today.
4. Which GPU Is Right for You?
"I want to run 7B-13B models and Stable Diffusion"
You need 12-16 GB VRAM. Best picks:
- Budget: RTX 3060 12 GB (used value tier) or Intel Arc A770 16 GB (budget wildcard)
- Sweet spot: RTX 4070 Ti Super 16 GB (upper mid-range)
"I want to run 70B models locally"
You need 24+ GB VRAM. The model at Q4 needs ~36 GB, so even 24 GB requires partial CPU offload. Best picks:
- Best value: RTX 3090 used (used high-end value) — 24 GB VRAM, solid performance
- Best performance: RTX 4090 (flagship) — fastest tokens/sec at 24 GB
- Full in-VRAM: RTX A6000 48 GB (workstation tier) — no offload needed
"I want a silent, low-power AI setup"
Apple Silicon is the only real option. The M4 Max with 128 GB unified memory runs 70B models at full FP16 — something no consumer GPU can do.
- Best pick: Mac Studio M4 Max or MacBook Pro M4 Max — 128 GB, ~40W, zero fan noise
- Tradeoff: Lower tokens/sec than discrete GPUs, locked to macOS
"I'm on Linux and want the best price/VRAM ratio"
AMD GPUs offer more VRAM per dollar, but require ROCm (Linux only).
- Best pick: RX 7900 XTX (high-end value) — 24 GB, competitive performance on Linux
- Caveat: Not all AI tools have ROCm support. Check compatibility before buying.
5. Benchmarks Methodology
All benchmarks were collected from community-reported results, hardware review sites, and our own testing. The numbers represent typical real-world performance, not theoretical peaks.
| Benchmark | Setup |
|---|---|
| LLM tok/s | Ollama / llama.cpp, Llama 3 70B Q4_K_M, 2048 context, eval throughput (not prompt processing) |
| SDXL s/img | ComfyUI, SDXL 1.0 base, 1024x1024, 20 DPM++ 2M Karras steps, FP16 |
| VRAM usage | Peak VRAM reported by nvidia-smi / rocm-smi during inference |
Your actual performance will vary based on system RAM, CPU, driver version, quantization method, and context length. These numbers are useful for relative comparison, not absolute guarantees.
6. Frequently Asked Questions
What is the best GPU for running AI locally in 2025?
The NVIDIA RTX 4090 is the best overall. For value, the used RTX 3090 is hard to beat thanks to its 24 GB VRAM. For silent operation and maximum model capacity, the Apple M4 Max 128 GB is unique.
Can AMD GPUs run AI models locally?
Yes. The RX 7900 XTX works well with ROCm on Linux. Ollama and llama.cpp both support AMD GPUs. However, Windows support is very limited, and not all AI tools (ComfyUI extensions, fine-tuning frameworks) have ROCm backends.
Is Apple M4 Max good for running LLMs?
Excellent for model capacity — 128 GB unified memory means you can run 70B FP16 or even larger models. Token speed is lower (~18 tok/s for 70B Q4) compared to the RTX 4090 (~45 tok/s), but the silence, low power, and massive memory make it unique.
How many tokens/sec do I need?
15-20 tok/s feels like natural conversation. Below 10 feels slow. For code generation or batch processing, even 5 tok/s is fine since you are not waiting interactively. Stable Diffusion is measured in seconds per image instead.
Should I buy an RTX 4090 or two RTX 3090s?
For simplicity: one RTX 4090. Multi-GPU requires compatible software, a motherboard with enough PCIe lanes, a large PSU, and adds debugging complexity. Two 3090s give 48 GB total and can be attractive for large models, but setup is not trivial.
Check What Your GPU Can Run
Enter your GPU model and see exactly which AI models fit in your VRAM — with headroom calculations.