M4 Ultra
Prós
- Runs Hermes 3 70B at Q4 natively
- 128 GB VRAM — adequate headroom
3 consumer GPUs can run Hermes 3 70B at Q4 natively. Precise VRAM thresholds and benchmarks below.
Prices and availability may change · affiliate link
llama.cpp 0.2.x · CUDA 12 · ROCm 6 · updated monthly · methodology →
This model requires aFlagship GPU (48 GB+ VRAM)
Best picks by compatibility, VRAM headroom, and value — prices and availability may change.
Prós
Prós
Prós
Alguns links são links de afiliado da Amazon. Podemos receber uma comissão sem custo adicional para si. O cookie da Amazon pode durar até 24 horas após o clique.
CPU vs GPU for Hermes 3 70B →
VRAM Calculator — instant compatibility check
M4 Ultra
128 GB · Runs Q4 natively · Check availability
*Prices and availability may change. Some links are affiliate links.
| Quantization | VRAM needed | Disk space | Quality |
|---|---|---|---|
| FP16 (max quality) | 140 GB | 140 GB | Maximum |
| Q8 (high quality) | 70 GB | 70 GB | Near-lossless |
| Q4 (recommended) Best balance | 40 GB | 40 GB | Recommended |
| Q2 (minimum) | 20 GB | 20 GB | Quality loss |
| Developer | Nous Research |
| Parameters | 70B |
| Context window | 131,072 tokens |
| License | llama-3.1-community |
| Use cases | agent, function-calling, reasoning, chat, roleplay |
| Released | 2024-08 |
Install with Ollama
ollama run hermes3:70b Hugging Face
NousResearch/Hermes-3-Llama-3.1-70B Hermes 3 70B requires <strong class="text-primary-container">40 GB VRAM</strong> at Q4. 3 consumer GPUs meet this threshold. Below 8 GB or 38 GB you'll hit significant offload latency.
3 Q4 native · 7 offload
| GPU Unit | VRAM | Compatibility | Est. Speed | Action |
|---|---|---|---|---|
| M4 Ultra | 128GB | Optimal | 45 tok/s | Calculate → |
| M3 Ultra | 192GB | Optimal | 38 tok/s | Calculate → |
| M4 Max 48GB | 48GB | Optimal | 20 tok/s | Calculate → |
| RTX 5090 | 32GB | Offload | — | Calculate → |
| RTX 4090 | 24GB | Offload | — | Calculate → |
| RTX 3090 | 24GB | Offload | — | Calculate → |
| RX 7900 XTX | 24GB | Offload | — | Calculate → |
| M4 Max 36GB | 36GB | Offload | — | Calculate → |
| RX 7900 XT | 20GB | Offload | — | Calculate → |
| M4 Pro | 24GB | Offload | — | Calculate → |
Best picks by compatibility, VRAM headroom, and value — prices and availability may change.
M4 Ultra
128 GB VRAM
Check availability →
M3 Ultra
192 GB VRAM
Check availability →
M4 Max 48GB
48 GB VRAM
Check availability →
Alguns links são links de afiliado da Amazon. Podemos receber uma comissão sem custo adicional para si. O cookie da Amazon pode durar até 24 horas após o clique.
Hermes 3 70B can run on CPU without a dedicated GPU — unusual for a 70B model. On an i7-13700K with llama.cpp Q4 it reaches 0.8 tok/s (slow but usable). With a GPU you get 4–6× more speed — check the VRAM calculator for specifics.
Which GPU is worth it? Real specs and benchmarks side by side.
GPUs that run Hermes 3 70B at Q4 — sorted by AI performance score.
Alguns links são links de afiliado da Amazon. Podemos receber uma comissão sem custo adicional para si. O cookie da Amazon pode durar até 24 horas após o clique.
Similar models in the agent category with comparable VRAM footprints.
The VRAM Calculator tells you exactly which quantization your hardware can handle.
M4 Ultra
Preços mudam diariamente