Skip to main content
💬 Workflow Template

Run a Local Chat AI

Set up a private, offline ChatGPT-style assistant on your own hardware with Ollama and Llama 3.

Setup time

15–30 minutes

Min hardware

8 GB VRAM (GTX 1080 / RX 580 / M1 Mac)

Software

Ollama

Recommended model

Llama 3.1 8B

1

Install Ollama

Download and install Ollama from ollama.com. It runs on macOS, Linux, and Windows. The installer sets up the Ollama server automatically.

Tip:

On Linux, run: curl -fsSL https://ollama.com/install.sh | sh

2

Download your model

Open a terminal and pull the recommended model: ollama pull llama3.1:8b. This downloads ~4.7 GB to your local machine.

Tip:

For 6 GB VRAM, try llama3.2:3b instead (2 GB download)

3

Start your first inference

Run ollama run llama3.1:8b and start chatting directly in the terminal.

4

(Optional) Add a Web UI

Install Open WebUI for a browser-based chat interface: docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main. Then visit localhost:3000.

Tip:

Open WebUI supports multiple models, history, and file uploads

5

Verify your setup

Ask the model: "What is local AI?" — you should receive a response within a few seconds. If response is slow, try a smaller quantization or model.

Affiliate disclosure: links to Amazon may earn us a commission at no extra cost to you.

Compatible GPUs for this workflow

For smooth local AI inference with this workflow, these NVIDIA RTX GPUs deliver the best experience.

Prices and availability may change.