Setup Guide

Run Gemma 4 locally — complete setup guide with Ollama

Q: What is the difference between Gemma 4 12B and 27B?

Gemma 4 27B scores higher on reasoning and analysis benchmarks (quality score 90 vs 86) but requires 14.9 GB VRAM at Q4 versus 6.6 GB. The 12B is the sweet spot for most users with consumer GPUs; the 27B is for enthusiasts with 16+ GB VRAM or Apple Silicon.

Google Gemma 4 is the latest open-weight model from Google DeepMind, available in 12B and 27B parameter variants. Both support vision, reasoning, and 256K context. This guide covers exact VRAM requirements, step-by-step Ollama installation, real benchmark data, and which GPUs can run it.

Alex Chen AI Hardware Specialist Actualizado 2026-04-11

GitHub: github.com/javier-morales-ia

AI chip on a circuit board — Gemma 4 local setup guide cover

Disclosure: this article contains affiliate links. We may earn a commission from qualifying purchases at no cost to you. Prices and availability are subject to change.

What is Gemma 4?

Gemma 4 is Google DeepMind's April 2026 release of their open-weight model family. Built on the same architecture as Gemini, it comes in two sizes optimized for local inference: a 12B parameter model that fits on any 8 GB GPU, and a 27B parameter model for users with 16+ GB VRAM.

Both variants are multimodal — they accept text and images as input — and support a 256,000-token context window, making them suitable for long-document analysis, coding with large codebases, and extended conversations. The license is Apache 2.0, which means full commercial use is permitted.

12B / 27B

Parameters

256K

Context length

Apache 2.0

License

Vision

Multimodal

Gemma 4 VRAM requirements — exact numbers

The amount of VRAM you need depends on the model size and quantization level. Q4 (4-bit quantization) is the sweet spot for most users — it cuts memory usage by 75% with minimal quality loss.

Model	FP16	Q8	Q4	Q2
Gemma 4 12B	26.4 GB	13.2 GB	6.6 GB	3.3 GB
Gemma 4 27B	59.4 GB	29.7 GB	14.9 GB	7.4 GB

Key takeaway: Gemma 4 12B at Q4 fits on any 8 GB GPU (RTX 3060, RTX 4060, RX 7600). The 27B variant needs 16 GB (RTX 4060 Ti 16GB, RX 7800 XT) or Apple Silicon with 16+ GB unified memory. Use our VRAM Calculator to check your specific GPU.

Compatible GPUs for Gemma 4

39 GPUs in our database can run Gemma 4 12B at Q4 quantization. Here are the top picks by price tier.

Entry tier (under $350)

Best for running Gemma 4 12B at Q4 for daily chat, reasoning, and vision tasks.

budget Amazon Prime

RTX 3060 12GB

4.8 (1,400 reviews)

Pros

12 GB VRAM — runs 12B with headroom
30 tok/s on Llama 7B Q4
Best entry point for local AI

Check availability on Amazon

mid Amazon Prime

RTX 4060 8GB

4.6 (780 reviews)

Pros

8 GB — tight but runs 12B Q4
Ada Lovelace efficiency
Low 115W TDP

Check availability on Amazon

budget Amazon

RX 7600 8GB

4.5 (520 reviews)

Pros

8 GB GDDR6
AMD alternative
ROCm support on Linux

Check availability on Amazon

Enthusiast tier (for Gemma 4 27B)

You need 16+ GB VRAM for the 27B variant at Q4.

mid Amazon Prime

RTX 4060 Ti 16GB

4.5 (312 reviews)

Pros

16 GB — runs 27B Q4 natively
Sweet spot for Gemma 4 27B
165W TDP

Check availability on Amazon

mid Amazon

RX 7800 XT 16GB

4.6 (430 reviews)

Pros

16 GB GDDR6
AMD high-value pick
Strong local AI performance

Check availability on Amazon

pro Amazon Prime

RTX 4090 24GB

4.9 (1,204 reviews)

Pros

24 GB — runs 27B Q8 comfortably
Top-tier inference speed
Future-proof

Check availability on Amazon

Prices and availability may change. Some links are affiliate links.

Install Gemma 4 with Ollama — step by step

Ollama is the fastest way to get Gemma 4 running locally. One command to install, one command to pull the model, and you are chatting.

Step 1 — Install Ollama

Available for Windows, macOS, and Linux. Download from ollama.com or use the terminal:

curl -fsSL https://ollama.com/install.sh | sh

Step 2 — Pull Gemma 4

Choose your variant based on available VRAM:

# 12B — needs 6.6 GB VRAM (Q4)
ollama pull gemma4:12b

# 27B — needs 14.9 GB VRAM (Q4)
ollama pull gemma4:27b

Step 3 — Start chatting

ollama run gemma4:12b

For a web UI, add Open WebUI:

docker run -d -p 3000:8080 ghcr.io/open-webui/open-webui:ollama

Step 4 — Test vision (multimodal)

Gemma 4 accepts images. Send one via the API:

curl http://localhost:11434/api/generate -d '{
  "model": "gemma4:12b",
  "prompt": "Describe this image in detail",
  "images": ["base64-encoded-image-here"]
}'

Gemma 4 performance benchmarks

Based on published benchmarks and community testing. Gemma 4 outperforms Gemma 3 across the board and competes with models 2-3x its size.

Model	Quality score	VRAM Q4	CPU tok/s	Context
Gemma 4 27B	90	14.9 GB	~3 tok/s	256K
Gemma 4 12B	86	6.6 GB	~8 tok/s	256K
Phi-4 14B	88	8.4 GB	~6 tok/s	16K
Llama 3.1 8B	78	5.0 GB	~12 tok/s	128K
Gemma 3 12B	82	7.1 GB	~7 tok/s	128K

Best hardware picks for Gemma 4

Gemma 4 vs alternatives — which should you run?

Gemma 4 12B vs Phi-4 14B: Phi-4 scores slightly higher (88 vs 86) but has only a 16K context window. Gemma 4 wins on context (256K) and adds vision capability. If you need long documents or image input, Gemma 4 is the better choice. If you only need chat with short context, Phi-4 has a slight edge.

Gemma 4 12B vs Llama 3.1 8B: Llama 3.1 is lighter (5 GB VRAM) and faster on CPU but scores lower (78 vs 86). Gemma 4 is the upgrade path for anyone currently running Llama who wants better quality without jumping to 70B.

Gemma 4 27B vs Qwen2.5 14B: Gemma 4 27B needs more VRAM (14.9 vs 8.4 GB) but delivers higher quality (90 vs 84). If you have a 16 GB card, the 27B is worth the extra memory. If you are on 12 GB, stick with Qwen or the Gemma 4 12B.

Frequently asked questions

How much VRAM does Gemma 4 need?

Gemma 4 12B needs 6.6 GB of VRAM at Q4 quantization — any GPU with 8 GB or more runs it comfortably. Gemma 4 27B needs 14.9 GB at Q4, so you need a 16 GB card like the RTX 4060 Ti 16GB or an Apple M-series Mac with 16+ GB unified memory.

Can I run Gemma 4 on CPU only?

Yes. The 12B variant runs at approximately 8 tokens/second on an i7 CPU. Usable for testing, but a GPU with 8+ GB VRAM delivers 5-10x faster inference.

What is the difference between Gemma 4 12B and 27B?

The 27B scores higher on reasoning and analysis (quality score 90 vs 86) but requires 14.9 GB VRAM at Q4 versus 6.6 GB. The 12B is the sweet spot for most consumer GPUs; the 27B is for enthusiasts with 16+ GB VRAM or Apple Silicon.

Recomendado:

Ver mejor precio

Run Gemma 4 locally — complete setup guide with Ollama

What is Gemma 4?

Gemma 4 VRAM requirements — exact numbers

Compatible GPUs for Gemma 4

Entry tier (under $350)

RTX 3060 12GB

RTX 4060 8GB

RX 7600 8GB

Enthusiast tier (for Gemma 4 27B)

RTX 4060 Ti 16GB

RX 7800 XT 16GB

RTX 4090 24GB

Install Gemma 4 with Ollama — step by step

Step 1 — Install Ollama

Step 2 — Pull Gemma 4

Step 3 — Start chatting

Step 4 — Test vision (multimodal)

Gemma 4 performance benchmarks

Best hardware picks for Gemma 4

Gemma 4 vs alternatives — which should you run?

Frequently asked questions

How much VRAM does Gemma 4 need?

Can I run Gemma 4 on CPU only?

What is the difference between Gemma 4 12B and 27B?

Related guides