By the RunAIatHome editorial team. This walkthrough focuses on first-run stability, model management, and local hardware fit.
Complete Ollama Guide
Master the easiest way to run LLMs on your own hardware.
1. What Is Ollama?
Ollama is an open-source tool that makes it incredibly easy to run large language models on your local machine. It handles model downloading, quantization, GPU acceleration, and serving through a simple command-line interface and REST API.
Think of Ollama as Docker for LLMs. It packages everything a model needs into a single downloadable unit, automatically configures GPU acceleration, and provides a consistent interface across operating systems. It supports macOS, Linux, and Windows.
Key Features
- One-command model download and execution
- Automatic GPU detection and acceleration (NVIDIA CUDA, AMD ROCm, Apple Metal)
- Built-in model library with 100+ pre-configured models
- OpenAI-compatible REST API for integration with other apps
- Custom model creation with Modelfiles
- Multi-model support with automatic memory management
2. Installation
macOS
Download the macOS app from the official website or use Homebrew:
brew install ollama Works natively on Apple Silicon (M1/M2/M3/M4) with Metal GPU acceleration. Intel Macs are supported but significantly slower.
Linux
Use the official install script:
curl -fsSL https://ollama.com/install.sh | sh This installs Ollama and sets it up as a systemd service. Make sure you have NVIDIA drivers installed for GPU acceleration. AMD GPUs require ROCm.
Windows
Download the Windows installer from ollama.com/download.
Requires Windows 10 or later. NVIDIA GPUs are fully supported. AMD GPU support is available but may require manual ROCm setup.
Verify installation: Open a terminal and run ollama --version. You should see the installed version number.
3. Pulling and Running Models
Ollama uses a Docker-like syntax to pull and run models. Here are the essential commands:
Pull a model (download without running):
ollama pull llama3.1 Run a model (pulls automatically if not downloaded):
ollama run llama3.1 Run a specific size variant:
ollama run llama3.1:70b Run with a system prompt:
ollama run llama3.1 "You are a helpful coding assistant" Popular Models
| Model | Size | VRAM Needed | Best For |
|---|---|---|---|
| llama3.1:8b | 4.7 GB | 6 GB | General chat, quick responses |
| llama3.1:70b | 40 GB | 48 GB | High quality, complex reasoning |
| mistral:7b | 4.1 GB | 6 GB | Fast, efficient general use |
| codellama:13b | 7.4 GB | 10 GB | Code generation and review |
| phi3:mini | 2.3 GB | 4 GB | Lightweight, low-VRAM GPUs |
4. Model Management
List all downloaded models:
ollama list Show model details (parameters, quantization, size):
ollama show llama3.1 Delete a model to free disk space:
ollama rm llama3.1:70b Copy a model (to create a custom variant):
ollama cp llama3.1 my-assistant Storage tip: Models are stored in ~/.ollama/models. Large models (70B+) can take 40+ GB. Make sure you have enough SSD space before pulling them.
5. Advanced Usage
Using the REST API
Ollama runs a local API server on port 11434. You can integrate it with any application that supports the OpenAI API format:
Generate a response via API:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "Explain quantum computing in simple terms"
}' Custom Modelfiles
Create custom model configurations with a Modelfile. This lets you set system prompts, adjust parameters, and package custom behavior:
FROM llama3.1
SYSTEM "You are a senior software engineer. Provide concise, practical answers with code examples."
PARAMETER temperature 0.7
PARAMETER num_ctx 4096 Build and run your custom model:
ollama create my-coder -f Modelfile
ollama run my-coder Environment Variables
| Variable | Description | Default |
|---|---|---|
| OLLAMA_HOST | API listen address | 127.0.0.1:11434 |
| OLLAMA_MODELS | Model storage path | ~/.ollama/models |
| OLLAMA_NUM_PARALLEL | Max concurrent requests | 1 |
| OLLAMA_MAX_LOADED_MODELS | Max models in memory | 1 |
GPUs recomendadas para ejecutar Ollama con GPU
Ollama GPU acceleration requires 4+ GB VRAM. 13B+ models need 16 GB.
Precios y disponibilidad pueden cambiar. Enlaces de afiliado.
Entry Tier
8–12 GB VRAMRTX 4060
8 GB VRAMRTX 3060
12 GB VRAMMid Tier
12–16 GB VRAMRTX 4060 Ti 16GB
16 GB VRAMRTX 4070
12 GB VRAMHigh Tier
24 GB VRAMRTX 4090
24 GB VRAMRTX 3090
24 GB VRAMNeed a Better GPU for Ollama?
Check our GPU recommendations and find the best card for your budget.