Skip to main content
Beginner 15 min read ·

By the RunAIatHome editorial team. This walkthrough focuses on first-run stability, model management, and local hardware fit.

Complete Ollama Guide

Master the easiest way to run LLMs on your own hardware.

1. What Is Ollama?

Ollama is an open-source tool that makes it incredibly easy to run large language models on your local machine. It handles model downloading, quantization, GPU acceleration, and serving through a simple command-line interface and REST API.

Think of Ollama as Docker for LLMs. It packages everything a model needs into a single downloadable unit, automatically configures GPU acceleration, and provides a consistent interface across operating systems. It supports macOS, Linux, and Windows.

Key Features

  • One-command model download and execution
  • Automatic GPU detection and acceleration (NVIDIA CUDA, AMD ROCm, Apple Metal)
  • Built-in model library with 100+ pre-configured models
  • OpenAI-compatible REST API for integration with other apps
  • Custom model creation with Modelfiles
  • Multi-model support with automatic memory management

2. Installation

macOS

Download the macOS app from the official website or use Homebrew:

brew install ollama

Works natively on Apple Silicon (M1/M2/M3/M4) with Metal GPU acceleration. Intel Macs are supported but significantly slower.

Linux

Use the official install script:

curl -fsSL https://ollama.com/install.sh | sh

This installs Ollama and sets it up as a systemd service. Make sure you have NVIDIA drivers installed for GPU acceleration. AMD GPUs require ROCm.

Windows

Download the Windows installer from ollama.com/download.

Requires Windows 10 or later. NVIDIA GPUs are fully supported. AMD GPU support is available but may require manual ROCm setup.

Verify installation: Open a terminal and run ollama --version. You should see the installed version number.

3. Pulling and Running Models

Ollama uses a Docker-like syntax to pull and run models. Here are the essential commands:

Pull a model (download without running):

ollama pull llama3.1

Run a model (pulls automatically if not downloaded):

ollama run llama3.1

Run a specific size variant:

ollama run llama3.1:70b

Run with a system prompt:

ollama run llama3.1 "You are a helpful coding assistant"

Popular Models

Model Size VRAM Needed Best For
llama3.1:8b 4.7 GB 6 GB General chat, quick responses
llama3.1:70b 40 GB 48 GB High quality, complex reasoning
mistral:7b 4.1 GB 6 GB Fast, efficient general use
codellama:13b 7.4 GB 10 GB Code generation and review
phi3:mini 2.3 GB 4 GB Lightweight, low-VRAM GPUs

4. Model Management

List all downloaded models:

ollama list

Show model details (parameters, quantization, size):

ollama show llama3.1

Delete a model to free disk space:

ollama rm llama3.1:70b

Copy a model (to create a custom variant):

ollama cp llama3.1 my-assistant

Storage tip: Models are stored in ~/.ollama/models. Large models (70B+) can take 40+ GB. Make sure you have enough SSD space before pulling them.

5. Advanced Usage

Using the REST API

Ollama runs a local API server on port 11434. You can integrate it with any application that supports the OpenAI API format:

Generate a response via API:

curl http://localhost:11434/api/generate -d '{ "model": "llama3.1", "prompt": "Explain quantum computing in simple terms" }'

Custom Modelfiles

Create custom model configurations with a Modelfile. This lets you set system prompts, adjust parameters, and package custom behavior:

FROM llama3.1 SYSTEM "You are a senior software engineer. Provide concise, practical answers with code examples." PARAMETER temperature 0.7 PARAMETER num_ctx 4096

Build and run your custom model:

ollama create my-coder -f Modelfile ollama run my-coder

Environment Variables

Variable Description Default
OLLAMA_HOST API listen address 127.0.0.1:11434
OLLAMA_MODELS Model storage path ~/.ollama/models
OLLAMA_NUM_PARALLEL Max concurrent requests 1
OLLAMA_MAX_LOADED_MODELS Max models in memory 1

GPUs recomendadas para ejecutar Ollama con GPU

Ollama GPU acceleration requires 4+ GB VRAM. 13B+ models need 16 GB.

Precios y disponibilidad pueden cambiar. Enlaces de afiliado.

Entry Tier

8–12 GB VRAM

RTX 4060

8 GB VRAM
Ver disponibilidad →

RTX 3060

12 GB VRAM
Ver disponibilidad →

Mid Tier

12–16 GB VRAM

RTX 4060 Ti 16GB

16 GB VRAM
Ver disponibilidad →

RTX 4070

12 GB VRAM
Ver disponibilidad →

High Tier

24 GB VRAM

RTX 4090

24 GB VRAM
Ver disponibilidad →

RTX 3090

24 GB VRAM
Ver disponibilidad →

Need a Better GPU for Ollama?

Check our GPU recommendations and find the best card for your budget.