Skip to main content
Model Library v2.4

AI Models for
Local Inference

94 models with exact VRAM requirements at FP16, Q8, Q4, and Q2. Select any model to see which GPUs can run it and at what quality.

94 models indexed
49 run on 8 GB
58 chat
13 code
Recommendation

Starting out?

Llama 3.1 8B Q4 is the best entry point — runs on any GPU with 6 GB+ VRAM. Step up to Mistral 7B or Llama 3.1 13B if you have 12 GB+ VRAM.

View Llama 3.1 8B
Model Route Selector
/modelo/ decision layer

Pick a model route that fits your hardware

Intent-first guidance:These routes tie your intent to a minimum VRAM target plus a recommended model and GPU before entering the full model catalog.

3
Decision Scenarios
9.3GB
Avg Min VRAM
12GB
Highest Route

Chat & Reasoning (58)

General-purpose LLMs for conversation and complex reasoning

Llama 3.1 405B 230 GB
Parameters 405B
Provider Meta
Context 131K tokens
llama-3.1-community View details →
DeepSeek R1 403 GB
Parameters 671B
Provider DeepSeek
Context 128K tokens
MIT View details →
DeepSeek V3.2 369.1 GB
Parameters 671B
Provider DeepSeek
Context 128K tokens
MIT View details →
DeepSeek V3 411 GB
Parameters 685B
Provider DeepSeek
Context 128K tokens
MIT View details →
Llama 3.3 70B 42 GB
Parameters 70B
Provider Meta
Context 128K tokens
Min GPU M4 Max 48GB
llama-3-community View details →
Qwen2.5 72B 41 GB
Parameters 72B
Provider Alibaba
Context 131K tokens
Min GPU M4 Max 48GB
Apache-2.0 View details →
Llama 3.1 70B 40 GB
Parameters 70B
Provider Meta
Context 131K tokens
Min GPU M4 Max 48GB
llama-3.1-community View details →
DeepSeek R1 Distill 32B 19.2 GB
Parameters 32B
Provider DeepSeek
Context 128K tokens
Min GPU RX 7900 XT
MIT View details →
Qwen3 235B-A22B 129.3 GB
Parameters 235B
Provider Alibaba
Context 131K tokens
Min GPU M3 Ultra
Apache 2.0 View details →
Qwen2.5 32B 19.2 GB
Parameters 32B
Provider Alibaba
Context 131K tokens
Min GPU RX 7900 XT
Apache-2.0 View details →
Command R+ 59 GB
Parameters 104B
Provider Cohere
Context 131K tokens
Min GPU M4 Ultra
CC-BY-NC-4.0 View details →
Qwen3.5 35B-A3B 19.3 GB
Parameters 35B
Provider Alibaba
Context 128K tokens
CPU speed 8 tok/s
Apache 2.0 View details →
Gemma 2 27B 15 GB
Parameters 27B
Provider Google
Context 8K tokens
Min GPU M1 Pro
Gemma View details →
Gemma 3 27B 16.2 GB
Parameters 27B
Provider Google
Context 128K tokens
Min GPU M3 Pro
Gemma View details →
Mistral Small 4 65.5 GB
Parameters 119B
Provider Mistral AI
Context 256K tokens
Min GPU M4 Ultra
Apache 2.0 View details →
Mixtral 8x7B 26 GB
Parameters 46.7B
Provider Mistral AI
Context 33K tokens
Min GPU RTX 5090
Apache-2.0 View details →
Mistral Small 3 14.4 GB
Parameters 24B
Provider Mistral AI
Context 33K tokens
Min GPU M1 Pro
Apache-2.0 View details →
Phi-4 8.4 GB
Parameters 14B
Provider Microsoft
Context 16K tokens
Min GPU RTX 3080
MIT View details →
Qwen3 32B 17.6 GB
Parameters 32B
Provider Alibaba
Context 128K tokens
CPU speed 2 tok/s
Apache 2.0 View details →
Qwen3 30B-A3B 16.5 GB
Parameters 30B
Provider Alibaba
Context 131K tokens
Min GPU M3 Pro
Apache 2.0 View details →
DeepSeek R1 Distill 14B 8.4 GB
Parameters 14B
Provider DeepSeek
Context 128K tokens
Min GPU RTX 3080
MIT View details →
Qwen3.5 27B 14.9 GB
Parameters 27B
Provider Alibaba
Context 128K tokens
CPU speed 3 tok/s
Apache 2.0 View details →
Magistral Small 24B 13.2 GB
Parameters 24B
Provider Mistral AI
Context 128K tokens
CPU speed 5 tok/s
Apache 2.0 View details →
Yi 1.5 34B 20 GB
Parameters 34B
Provider 01.AI
Context 4K tokens
Min GPU RX 7900 XT
Apache-2.0 View details →
Qwen2.5 14B 8.4 GB
Parameters 14B
Provider Alibaba
Context 131K tokens
Min GPU RTX 3080
Apache-2.0 View details →
Mistral Small 3.2 13.2 GB
Parameters 24B
Provider Mistral AI
Context 128K tokens
CPU speed 1 tok/s
Apache 2.0 View details →
Qwen3 14B 7.7 GB
Parameters 14B
Provider Alibaba
Context 131K tokens
CPU speed 5 tok/s
Apache 2.0 View details →
Gemma 3 12B 7.2 GB
Parameters 12B
Provider Google
Context 128K tokens
Min GPU RTX 3050 8GB
Gemma View details →
Phi-3 Medium 8 GB
Parameters 14B
Provider Microsoft
Context 128K tokens
Min GPU RTX 3050 8GB
MIT View details →
DeepSeek R1 Distill 8B 4.8 GB
Parameters 8B
Provider DeepSeek
Context 128K tokens
CPU speed 8 tok/s
MIT View details →
Mistral Nemo 12B 7 GB
Parameters 12B
Provider Mistral AI
Context 131K tokens
CPU speed 6 tok/s
Apache-2.0 View details →
Qwen3.5 9B 5 GB
Parameters 9B
Provider Alibaba
Context 128K tokens
CPU speed 12 tok/s
Apache 2.0 View details →
Qwen3 8B 4.4 GB
Parameters 8B
Provider Alibaba
Context 128K tokens
CPU speed 9 tok/s
Apache 2.0 View details →
Gemma 2 9B 5.5 GB
Parameters 9B
Provider Google
Context 8K tokens
Min GPU GTX 1660 Super
Gemma View details →
Phi-3.5 MoE 21 GB
Parameters 41.9B
Provider Microsoft
Context 131K tokens
Min GPU M4 Pro
MIT View details →
Phi-4 Mini 2.1 GB
Parameters 3.8B
Provider Microsoft
Context 128K tokens
CPU speed 30 tok/s
MIT View details →
Llama 3.1 8B 5 GB
Parameters 8B
Provider Meta
Context 131K tokens
CPU speed 7 tok/s
llama-3.1-community View details →
Qwen2.5 7B 4.5 GB
Parameters 7B
Provider Alibaba
Context 131K tokens
CPU speed 8 tok/s
Apache-2.0 View details →
DeepSeek V2 Lite 9 GB
Parameters 16B
Provider DeepSeek
Context 33K tokens
Min GPU RTX 3080
DeepSeek View details →
Mistral 7B 4.5 GB
Parameters 7B
Provider Mistral AI
Context 33K tokens
CPU speed 8 tok/s
Apache-2.0 View details →
Yi 1.5 9B 5.5 GB
Parameters 9B
Provider 01.AI
Context 4K tokens
Min GPU GTX 1660 Super
Apache-2.0 View details →
Phi-3 Small 4.5 GB
Parameters 7B
Provider Microsoft
Context 128K tokens
CPU speed 8 tok/s
MIT View details →
Qwen3.5 4B 2.6 GB
Parameters 4.66B
Provider Alibaba
Context 262K tokens
CPU speed 12 tok/s
Apache 2.0 View details →
Qwen3 4B 2.2 GB
Parameters 4B
Provider Alibaba
Context 131K tokens
CPU speed 15 tok/s
Apache 2.0 View details →
Gemma 3 4B 2.4 GB
Parameters 4B
Provider Google
Context 128K tokens
CPU speed 16 tok/s
Gemma View details →
Phi-3.5 Mini 2.3 GB
Parameters 3.8B
Provider Microsoft
Context 128K tokens
CPU speed 13 tok/s
MIT View details →
DeepSeek R1 Distill 1.5B 1 GB
Parameters 1.5B
Provider DeepSeek
Context 128K tokens
CPU speed 35 tok/s
MIT View details →
Yi 1.5 6B 3.7 GB
Parameters 6B
Provider 01.AI
Context 4K tokens
CPU speed 9 tok/s
Apache-2.0 View details →
Phi-3 Mini 2.5 GB
Parameters 3.8B
Provider Microsoft
Context 128K tokens
CPU speed 14 tok/s
MIT View details →
Qwen3.5 2B 1.2 GB
Parameters 2.27B
Provider Alibaba
Context 262K tokens
CPU speed 22 tok/s
Apache 2.0 View details →
Qwen3 1.7B 0.9 GB
Parameters 1.7B
Provider Alibaba
Context 131K tokens
CPU speed 35 tok/s
Apache 2.0 View details →
Gemma 2 2B 1.5 GB
Parameters 2B
Provider Google
Context 8K tokens
CPU speed 32 tok/s
Gemma View details →
Qwen2.5 3B 1.9 GB
Parameters 3B
Provider Alibaba
Context 131K tokens
CPU speed 20 tok/s
Apache-2.0 View details →
Llama 3.2 3B 1.8 GB
Parameters 3B
Provider Meta
Context 131K tokens
CPU speed 18 tok/s
llama-3.2-community View details →
Gemma 3 1B 0.7 GB
Parameters 1B
Provider Google
Context 128K tokens
CPU speed 42 tok/s
Gemma View details →
Qwen2.5 1.5B 1 GB
Parameters 1.5B
Provider Alibaba
Context 131K tokens
CPU speed 38 tok/s
Apache-2.0 View details →
Llama 3.2 1B 0.6 GB
Parameters 1B
Provider Meta
Context 131K tokens
CPU speed 52 tok/s
llama-3.2-community View details →
Qwen2.5 0.5B 0.35 GB
Parameters 0.5B
Provider Alibaba
Context 131K tokens
CPU speed 95 tok/s
Apache-2.0 View details →

Code Generation (13)

Specialized models for writing, reviewing, and explaining code

Qwen2.5-Coder 32B 19.2 GB
Parameters 32B
Provider Alibaba
Context 131K tokens
Min GPU RX 7900 XT
Apache-2.0 View details →
Qwen3-Coder-Next 80B-A3B 44 GB
Parameters 80B
Provider Alibaba
Context 262K tokens
Min GPU M4 Max 48GB
Apache 2.0 View details →
Qwen3-Coder 30B-A3B 16.5 GB
Parameters 30B
Provider Alibaba
Context 262K tokens
Min GPU M3 Pro
Apache 2.0 View details →
Devstral Small 2 24B 13.2 GB
Parameters 24B
Provider Mistral AI
Context 256K tokens
CPU speed 5 tok/s
Apache 2.0 View details →
CodeLlama 34B 19 GB
Parameters 34B
Provider Meta
Context 16K tokens
Min GPU RX 7900 XT
llama-2-community View details →
DeepSeek Coder V2 9 GB
Parameters 16B
Provider DeepSeek
Context 131K tokens
Min GPU RTX 3080
DeepSeek View details →
Qwen2.5 Coder 14B 8 GB
Parameters 14B
Provider Alibaba
Context 131K tokens
CPU speed 5 tok/s
Apache-2.0 View details →
StarCoder 2 15B 9 GB
Parameters 15B
Provider BigCode
Context 16K tokens
Min GPU RTX 3080
BigCode OpenRAIL-M v1 View details →
Qwen2.5-Coder 7B 4.2 GB
Parameters 7B
Provider Alibaba
Context 131K tokens
CPU speed 9 tok/s
Apache-2.0 View details →
StarCoder 2 7B 4.5 GB
Parameters 7B
Provider BigCode
Context 16K tokens
CPU speed 8 tok/s
BigCode OpenRAIL-M v1 View details →
CodeGemma 7B 4.5 GB
Parameters 7B
Provider Google
Context 8K tokens
Min GPU GTX 1660 Super
Gemma View details →
CodeLlama 7B 4.5 GB
Parameters 7B
Provider Meta
Context 16K tokens
CPU speed 8 tok/s
llama-2-community View details →
StarCoder 2 3B 1.9 GB
Parameters 3B
Provider BigCode
Context 16K tokens
CPU speed 18 tok/s
BigCode OpenRAIL-M v1 View details →