💬 Workflow Template

Run a Local Chat AI

Set up a private, offline ChatGPT-style assistant on your own hardware with Ollama and Llama 3.

Setup time

15–30 minutes

Min hardware

8 GB VRAM (GTX 1080 / RX 580 / M1 Mac)

Software

Ollama

Recommended model

Llama 3.1 8B

Install Ollama

Download and install Ollama from ollama.com. It runs on macOS, Linux, and Windows. The installer sets up the Ollama server automatically.

Tip:

On Linux, run: curl -fsSL https://ollama.com/install.sh | sh

Download your model

Open a terminal and pull the recommended model: ollama pull llama3.1:8b. This downloads ~4.7 GB to your local machine.

Tip:

For 6 GB VRAM, try llama3.2:3b instead (2 GB download)

Start your first inference

Run ollama run llama3.1:8b and start chatting directly in the terminal.

(Optional) Add a Web UI

Install Open WebUI for a browser-based chat interface: docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main. Then visit localhost:3000.

Tip:

Open WebUI supports multiple models, history, and file uploads

Verify your setup

Ask the model: "What is local AI?" — you should receive a response within a few seconds. If response is slow, try a smaller quantization or model.

Affiliate disclosure: links to Amazon may earn us a commission at no extra cost to you.

Compatible GPUs for this workflow

For smooth local AI inference with this workflow, these NVIDIA RTX GPUs deliver the best experience.

Entry

RTX 4060

8 GB VRAM

View specs Check availability

Balanced

RTX 4070

12 GB VRAM

View specs Check availability

High-end

RTX 4070 Ti Super

16 GB VRAM

View specs Check availability

Prices and availability may change.

Find your ideal AI model → Check GPU compatibility →