Model Library v2.4

AI Models for
Local Inference

94 models with exact VRAM requirements at FP16, Q8, Q4, and Q2. Select any model to see which GPUs can run it and at what quality.

94 models indexed

49 run on 8 GB

58 chat

13 code

Recommendation

Starting out?

Llama 3.1 8B Q4 is the best entry point — runs on any GPU with 6 GB+ VRAM. Step up to Mistral 7B or Llama 3.1 13B if you have 12 GB+ VRAM.

View Llama 3.1 8B

Check if your GPU can run any model

Model Route Selector

/modelo/ decision layer

Pick a model route that fits your hardware

Intent-first guidance:These routes tie your intent to a minimum VRAM target plus a recommended model and GPU before entering the full model catalog.

3

Decision Scenarios

9.3GB

Avg Min VRAM

12GB

Highest Route

check_circle

forum

Personal local AI assistant

Users who want privacy and want to skip cloud subscriptions

Model: Llama 3.1 8BGPU: RTX 4060

Selected

Minimum VRAM signal

Scenario requirement

8 GB

4GB8GB12GB16GB 24GB+

Llama 3.1 8B

Recommended Model

RTX 4060

Recommended GPU

Check compatibility Check availability

Model detail GPU detail

mic

Private audio transcription

Journalists, researchers, healthcare professionals

Model: Whisper Large V3GPU: RTX 3060

Minimum VRAM signal

Scenario requirement

8 GB

4GB8GB12GB16GB 24GB+

Whisper Large V3

Recommended Model

RTX 3060

Recommended GPU

Check compatibility Check availability

Model detail GPU detail

image

Local image generation

Creators and digital artists

Model: Stable Diffusion XLGPU: RTX 3060

Minimum VRAM signal

Scenario requirement

12 GB

4GB8GB12GB16GB 24GB+

Stable Diffusion XL

Recommended Model

RTX 3060

Recommended GPU

Check compatibility Check availability

Model detail GPU detail

Chat & Reasoning (58)

General-purpose LLMs for conversation and complex reasoning

Llama 3.1 405B 230 GB

Parameters 405B

Context 131K tokens

llama-3.1-community View details →

DeepSeek R1 403 GB

Parameters 671B

Provider DeepSeek

Context 128K tokens

MIT View details →

DeepSeek V3.2 369.1 GB

Parameters 671B

Provider DeepSeek

Context 128K tokens

MIT View details →

DeepSeek V3 411 GB

Parameters 685B

Provider DeepSeek

Context 128K tokens

MIT View details →

Llama 3.3 70B 42 GB

Context 128K tokens

Min GPU M4 Max 48GB

llama-3-community View details →

Qwen2.5 72B 41 GB

Provider Alibaba

Context 131K tokens

Min GPU M4 Max 48GB

Apache-2.0 View details →

Llama 3.1 70B 40 GB

Context 131K tokens

Min GPU M4 Max 48GB

llama-3.1-community View details →

DeepSeek R1 Distill 32B 19.2 GB

Provider DeepSeek

Context 128K tokens

Min GPU RX 7900 XT

MIT View details →

Qwen3 235B-A22B 129.3 GB

Parameters 235B

Provider Alibaba

Context 131K tokens

Min GPU M3 Ultra

Apache 2.0 View details →

Qwen2.5 32B 19.2 GB

Provider Alibaba

Context 131K tokens

Min GPU RX 7900 XT

Apache-2.0 View details →

Command R+ 59 GB

Parameters 104B

Provider Cohere

Context 131K tokens

Min GPU M4 Ultra

CC-BY-NC-4.0 View details →

Qwen3.5 35B-A3B 19.3 GB

Provider Alibaba

Context 128K tokens

CPU speed 8 tok/s

Apache 2.0 View details →

Gemma 2 27B 15 GB

Provider Google

Context 8K tokens

Gemma View details →

Gemma 3 27B 16.2 GB

Provider Google

Context 128K tokens

Gemma View details →

Mistral Small 4 65.5 GB

Parameters 119B

Provider Mistral AI

Context 256K tokens

Min GPU M4 Ultra

Apache 2.0 View details →

Mixtral 8x7B 26 GB

Parameters 46.7B

Provider Mistral AI

Context 33K tokens

Min GPU RTX 5090

Apache-2.0 View details →

Mistral Small 3 14.4 GB

Provider Mistral AI

Context 33K tokens

Apache-2.0 View details →

Provider Microsoft

Context 16K tokens

Min GPU RTX 3080

MIT View details →

Qwen3 32B 17.6 GB

Provider Alibaba

Context 128K tokens

CPU speed 2 tok/s

Apache 2.0 View details →

Qwen3 30B-A3B 16.5 GB

Provider Alibaba

Context 131K tokens

Apache 2.0 View details →

DeepSeek R1 Distill 14B 8.4 GB

Provider DeepSeek

Context 128K tokens

Min GPU RTX 3080

MIT View details →

Qwen3.5 27B 14.9 GB

Provider Alibaba

Context 128K tokens

CPU speed 3 tok/s

Apache 2.0 View details →

Magistral Small 24B 13.2 GB

Provider Mistral AI

Context 128K tokens

CPU speed 5 tok/s

Apache 2.0 View details →

Yi 1.5 34B 20 GB

Context 4K tokens

Min GPU RX 7900 XT

Apache-2.0 View details →

Qwen2.5 14B 8.4 GB

Provider Alibaba

Context 131K tokens

Min GPU RTX 3080

Apache-2.0 View details →

Mistral Small 3.2 13.2 GB

Provider Mistral AI

Context 128K tokens

CPU speed 1 tok/s

Apache 2.0 View details →

Qwen3 14B 7.7 GB

Provider Alibaba

Context 131K tokens

CPU speed 5 tok/s

Apache 2.0 View details →

Gemma 3 12B 7.2 GB

Provider Google

Context 128K tokens

Min GPU RTX 3050 8GB

Gemma View details →

Phi-3 Medium 8 GB

Provider Microsoft

Context 128K tokens

Min GPU RTX 3050 8GB

MIT View details →

DeepSeek R1 Distill 8B 4.8 GB

Provider DeepSeek

Context 128K tokens

CPU speed 8 tok/s

MIT View details →

Mistral Nemo 12B 7 GB

Provider Mistral AI

Context 131K tokens

CPU speed 6 tok/s

Apache-2.0 View details →

Qwen3.5 9B 5 GB

Provider Alibaba

Context 128K tokens

CPU speed 12 tok/s

Apache 2.0 View details →

Qwen3 8B 4.4 GB

Provider Alibaba

Context 128K tokens

CPU speed 9 tok/s

Apache 2.0 View details →

Gemma 2 9B 5.5 GB

Provider Google

Context 8K tokens

Min GPU GTX 1660 Super

Gemma View details →

Phi-3.5 MoE 21 GB

Parameters 41.9B

Provider Microsoft

Context 131K tokens

MIT View details →

Phi-4 Mini 2.1 GB

Parameters 3.8B

Provider Microsoft

Context 128K tokens

CPU speed 30 tok/s

MIT View details →

Llama 3.1 8B 5 GB

Context 131K tokens

CPU speed 7 tok/s

llama-3.1-community View details →

Qwen2.5 7B 4.5 GB

Provider Alibaba

Context 131K tokens

CPU speed 8 tok/s

Apache-2.0 View details →

DeepSeek V2 Lite 9 GB

Provider DeepSeek

Context 33K tokens

Min GPU RTX 3080

DeepSeek View details →

Mistral 7B 4.5 GB

Provider Mistral AI

Context 33K tokens

CPU speed 8 tok/s

Apache-2.0 View details →

Yi 1.5 9B 5.5 GB

Context 4K tokens

Min GPU GTX 1660 Super

Apache-2.0 View details →

Phi-3 Small 4.5 GB

Provider Microsoft

Context 128K tokens

CPU speed 8 tok/s

MIT View details →

Qwen3.5 4B 2.6 GB

Parameters 4.66B

Provider Alibaba

Context 262K tokens

CPU speed 12 tok/s

Apache 2.0 View details →

Qwen3 4B 2.2 GB

Provider Alibaba

Context 131K tokens

CPU speed 15 tok/s

Apache 2.0 View details →

Gemma 3 4B 2.4 GB

Provider Google

Context 128K tokens

CPU speed 16 tok/s

Gemma View details →

Phi-3.5 Mini 2.3 GB

Parameters 3.8B

Provider Microsoft

Context 128K tokens

CPU speed 13 tok/s

MIT View details →

DeepSeek R1 Distill 1.5B 1 GB

Parameters 1.5B

Provider DeepSeek

Context 128K tokens

CPU speed 35 tok/s

MIT View details →

Yi 1.5 6B 3.7 GB

Context 4K tokens

CPU speed 9 tok/s

Apache-2.0 View details →

Phi-3 Mini 2.5 GB

Parameters 3.8B

Provider Microsoft

Context 128K tokens

CPU speed 14 tok/s

MIT View details →

Qwen3.5 2B 1.2 GB

Parameters 2.27B

Provider Alibaba

Context 262K tokens

CPU speed 22 tok/s

Apache 2.0 View details →

Qwen3 1.7B 0.9 GB

Parameters 1.7B

Provider Alibaba

Context 131K tokens

CPU speed 35 tok/s

Apache 2.0 View details →

Gemma 2 2B 1.5 GB

Provider Google

Context 8K tokens

CPU speed 32 tok/s

Gemma View details →

Qwen2.5 3B 1.9 GB

Provider Alibaba

Context 131K tokens

CPU speed 20 tok/s

Apache-2.0 View details →

Llama 3.2 3B 1.8 GB

Context 131K tokens

CPU speed 18 tok/s

llama-3.2-community View details →

Gemma 3 1B 0.7 GB

Provider Google

Context 128K tokens

CPU speed 42 tok/s

Gemma View details →

Qwen2.5 1.5B 1 GB

Parameters 1.5B

Provider Alibaba

Context 131K tokens

CPU speed 38 tok/s

Apache-2.0 View details →

Llama 3.2 1B 0.6 GB

Context 131K tokens

CPU speed 52 tok/s

llama-3.2-community View details →

Qwen2.5 0.5B 0.35 GB

Parameters 0.5B

Provider Alibaba

Context 131K tokens

CPU speed 95 tok/s

Apache-2.0 View details →

Code Generation (13)

Specialized models for writing, reviewing, and explaining code

Qwen2.5-Coder 32B 19.2 GB

Provider Alibaba

Context 131K tokens

Min GPU RX 7900 XT

Apache-2.0 View details →

Qwen3-Coder-Next 80B-A3B 44 GB

Provider Alibaba

Context 262K tokens

Min GPU M4 Max 48GB

Apache 2.0 View details →

Qwen3-Coder 30B-A3B 16.5 GB

Provider Alibaba

Context 262K tokens

Apache 2.0 View details →

Devstral Small 2 24B 13.2 GB

Provider Mistral AI

Context 256K tokens

CPU speed 5 tok/s

Apache 2.0 View details →

CodeLlama 34B 19 GB

Context 16K tokens

Min GPU RX 7900 XT

llama-2-community View details →

DeepSeek Coder V2 9 GB

Provider DeepSeek

Context 131K tokens

Min GPU RTX 3080

DeepSeek View details →

Qwen2.5 Coder 14B 8 GB

Provider Alibaba

Context 131K tokens

CPU speed 5 tok/s

Apache-2.0 View details →

StarCoder 2 15B 9 GB

Provider BigCode

Context 16K tokens

Min GPU RTX 3080

BigCode OpenRAIL-M v1 View details →

Qwen2.5-Coder 7B 4.2 GB

Provider Alibaba

Context 131K tokens

CPU speed 9 tok/s

Apache-2.0 View details →

StarCoder 2 7B 4.5 GB

Provider BigCode

Context 16K tokens

CPU speed 8 tok/s

BigCode OpenRAIL-M v1 View details →

CodeGemma 7B 4.5 GB

Provider Google

Context 8K tokens

Min GPU GTX 1660 Super

Gemma View details →

CodeLlama 7B 4.5 GB

Context 16K tokens

CPU speed 8 tok/s

llama-2-community View details →

StarCoder 2 3B 1.9 GB

Provider BigCode

Context 16K tokens

CPU speed 18 tok/s

BigCode OpenRAIL-M v1 View details →

Vision & Multimodal (11)

Models that process images and text together

Llama 4 Maverick 116.1 GB

Parameters 211B

Context 1000K tokens

Min GPU M4 Ultra

Llama 4 View details →

Gemma 4 27B 14.9 GB

Provider Google

Context 256K tokens

CPU speed 3 tok/s

Apache 2.0 View details →

Gemma 4 31B 17.1 GB

Provider Google

Context 128K tokens

CPU speed 1 tok/s

Apache 2.0 View details →

Llama 3.2 90B Vision 54 GB

Context 131K tokens

Min GPU M4 Ultra

llama-3.2-community View details →

Llama 4 Scout 60 GB

Parameters 109B

Context 10000K tokens

Min GPU M4 Ultra

Llama 4 View details →

Gemma 4 12B 6.6 GB

Provider Google

Context 256K tokens

CPU speed 8 tok/s

Apache 2.0 View details →

Mistral Small 3.1 13.2 GB

Provider Mistral AI

Context 128K tokens

CPU speed 1 tok/s

Apache 2.0 View details →

Gemma 4 E4B 2.2 GB

Provider Google

Context 128K tokens

CPU speed 14 tok/s

Apache 2.0 View details →

Llama 3.2 11B Vision 6.6 GB

Context 131K tokens

Min GPU RTX 3050 8GB

llama-3.2-community View details →

LLaVA 1.5 7B 4.5 GB

Provider Haotian Liu et al.

Context 4K tokens

Min GPU GTX 1660 Super

llava-v1.5-community View details →

Gemma 4 E2B 1.1 GB

Provider Google

Context 128K tokens

CPU speed 25 tok/s

Apache 2.0 View details →

Image Generation (7)

Diffusion models for generating and editing images locally

Flux.1 Dev 12 GB

Provider Black Forest Labs

Min GPU RTX 3060

FLUX.1-dev Non-Commercial View details →

Stable Diffusion 3.5 Large 10 GB

Provider Stability AI

Min GPU RTX 3080

Stability AI Community View details →

FLUX.2 Dev 17.6 GB

Provider Black Forest Labs

FLUX.2-dev Non-Commercial View details →

Stable Diffusion 3.5 Medium 5 GB

Provider Stability AI

Min GPU GTX 1660 Super

Stability AI Community View details →

Stable Diffusion 3 Medium 3 GB

Provider Stability AI

Min GPU GTX 1660 Super

Stability AI Community View details →

Flux.1 Schnell 12 GB

Provider Black Forest Labs

Min GPU RTX 3060

Apache-2.0 View details →

Stable Diffusion XL 6 GB

Parameters 6.6B

Provider Stability AI

Min GPU GTX 1660 Super

CreativeML Open RAIL++-M View details →

Speech Recognition (5)

Transcription and translation models

Whisper Large V3 1.5 GB

Parameters 1.55B

Provider OpenAI

Min GPU GTX 1660 Super

MIT View details →

Whisper Medium 0.8 GB

Parameters 0.769B

Provider OpenAI

Min GPU GTX 1660 Super

MIT View details →

Whisper Small 0.4 GB

Parameters 0.244B

Provider OpenAI

Min GPU GTX 1660 Super

MIT View details →

Whisper Base 0.25 GB

Parameters 0.074B

Provider OpenAI

Min GPU GTX 1660 Super

MIT View details →

Whisper Tiny 0.2 GB

Parameters 0.039B

Provider OpenAI

Min GPU GTX 1660 Super

MIT View details →