1. What Is Ollama?

Ollama is an open-source tool that makes it incredibly easy to run large language models on your local machine. It handles model downloading, quantization, GPU acceleration, and serving through a simple command-line interface and REST API.

Think of Ollama as Docker for LLMs. It packages everything a model needs into a single downloadable unit, automatically configures GPU acceleration, and provides a consistent interface across operating systems. It supports macOS, Linux, and Windows.

Key Features

One-command model download and execution
Automatic GPU detection and acceleration (NVIDIA CUDA, AMD ROCm, Apple Metal)
Built-in model library with 100+ pre-configured models
OpenAI-compatible REST API for integration with other apps
Custom model creation with Modelfiles
Multi-model support with automatic memory management

2. Installation

macOS

Download the macOS app from the official website or use Homebrew:

brew install ollama

Works natively on Apple Silicon (M1/M2/M3/M4) with Metal GPU acceleration. Intel Macs are supported but significantly slower.

Linux

Use the official install script:

curl -fsSL https://ollama.com/install.sh | sh

This installs Ollama and sets it up as a systemd service. Make sure you have NVIDIA drivers installed for GPU acceleration. AMD GPUs require ROCm.

Windows

Download the Windows installer from ollama.com/download.

Requires Windows 10 or later. NVIDIA GPUs are fully supported. AMD GPU support is available but may require manual ROCm setup.

Verify installation: Open a terminal and run ollama --version. You should see the installed version number.

3. Pulling and Running Models

Ollama uses a Docker-like syntax to pull and run models. Here are the essential commands:

Pull a model (download without running):

ollama pull llama3.1

Run a model (pulls automatically if not downloaded):

ollama run llama3.1

Run a specific size variant:

ollama run llama3.1:70b

Run with a system prompt:

ollama run llama3.1 "You are a helpful coding assistant"

Popular Models

Model	Size	VRAM Needed	Best For
llama3.1:8b	4.7 GB	6 GB	General chat, quick responses
llama3.1:70b	40 GB	48 GB	High quality, complex reasoning
mistral:7b	4.1 GB	6 GB	Fast, efficient general use
codellama:13b	7.4 GB	10 GB	Code generation and review
phi3:mini	2.3 GB	4 GB	Lightweight, low-VRAM GPUs

4. Model Management

List all downloaded models:

ollama list

Show model details (parameters, quantization, size):

ollama show llama3.1

Delete a model to free disk space:

ollama rm llama3.1:70b

Copy a model (to create a custom variant):

ollama cp llama3.1 my-assistant

Storage tip: Models are stored in ~/.ollama/models. Large models (70B+) can take 40+ GB. Make sure you have enough SSD space before pulling them.

5. Advanced Usage

Using the REST API

Ollama runs a local API server on port 11434. You can integrate it with any application that supports the OpenAI API format:

Generate a response via API:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "Explain quantum computing in simple terms"
}'

Custom Modelfiles

Create custom model configurations with a Modelfile. This lets you set system prompts, adjust parameters, and package custom behavior:

FROM llama3.1
SYSTEM "You are a senior software engineer. Provide concise, practical answers with code examples."
PARAMETER temperature 0.7
PARAMETER num_ctx 4096

Build and run your custom model:

ollama create my-coder -f Modelfile
ollama run my-coder

Environment Variables

Variable	Description	Default
OLLAMA_HOST	API listen address	127.0.0.1:11434
OLLAMA_MODELS	Model storage path	~/.ollama/models
OLLAMA_NUM_PARALLEL	Max concurrent requests	1
OLLAMA_MAX_LOADED_MODELS	Max models in memory	1

GPUs recomendadas para ejecutar Ollama con GPU

Ollama GPU acceleration requires 4+ GB VRAM. 13B+ models need 16 GB.

Precios y disponibilidad pueden cambiar. Enlaces de afiliado.

Entry Tier

8–12 GB VRAM

RTX 4060

8 GB VRAM

Ver disponibilidad →

RTX 3060

12 GB VRAM

Ver disponibilidad →

Mid Tier

12–16 GB VRAM

RTX 4060 Ti 16GB

16 GB VRAM

Ver disponibilidad →

RTX 4070

12 GB VRAM

Ver disponibilidad →

High Tier

24 GB VRAM

RTX 4090

24 GB VRAM

Ver disponibilidad →

RTX 3090

24 GB VRAM

Ver disponibilidad →

Need a Better GPU for Ollama?

Check our GPU recommendations and find the best card for your budget.

GPU Buying Guide VRAM Calculator