Run a Local Chat AI
Set up a private, offline ChatGPT-style assistant on your own hardware with Ollama and Llama 3.
Setup time
15–30 minutes
Min hardware
8 GB VRAM (GTX 1080 / RX 580 / M1 Mac)
Software
Ollama
Recommended model
Llama 3.1 8B
Install Ollama
Download and install Ollama from ollama.com. It runs on macOS, Linux, and Windows. The installer sets up the Ollama server automatically.
On Linux, run: curl -fsSL https://ollama.com/install.sh | sh
Download your model
Open a terminal and pull the recommended model: ollama pull llama3.1:8b. This downloads ~4.7 GB to your local machine.
For 6 GB VRAM, try llama3.2:3b instead (2 GB download)
Start your first inference
Run ollama run llama3.1:8b and start chatting directly in the terminal.
(Optional) Add a Web UI
Install Open WebUI for a browser-based chat interface: docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main. Then visit localhost:3000.
Open WebUI supports multiple models, history, and file uploads
Verify your setup
Ask the model: "What is local AI?" — you should receive a response within a few seconds. If response is slow, try a smaller quantization or model.
Affiliate disclosure: links to Amazon may earn us a commission at no extra cost to you.
Compatible GPUs for this workflow
For smooth local AI inference with this workflow, these NVIDIA RTX GPUs deliver the best experience.
Prices and availability may change.