Skip to main content

Personal local AI assistant

A setup for chatting with an LLM privately and without limits. Your conversations never leave your PC.

VRAM mínima: 8 GB Para: Users who want privacy and want to skip cloud subscriptions GPU referencia: ~$320–540 (GPU only)

GPUs compatibles con este setup

Este escenario necesita al menos 8 GB de VRAM. Estas GPUs pueden ejecutarlo:

Precios y disponibilidad pueden variar. Ver todas las GPUs NVIDIA →

Por qué este setup

Este escenario está diseñado para users who want privacy and want to skip cloud subscriptions. La RTX 4060 8 GB ofrece el equilibrio óptimo entre capacidad (8 GB de VRAM mínimos), disponibilidad en el mercado y coste relativo para los casos de uso de este escenario.

Con 8 GB de VRAM puedes cargar los modelos recomendados en cuantización Q4 sin sacrificar demasiada calidad. El software listado está seleccionado por ser open source, activamente mantenido y compatible con el hardware de este tier.

Guía de configuración paso a paso

  1. 1

    Install Ollama from ollama.com (available for Windows, macOS, and Linux).

  2. 2

    Pull your first model: open a terminal and run `ollama pull llama3.1:8b`.

  3. 3

    Install Open WebUI with Docker: `docker run -d -p 3000:8080 ghcr.io/open-webui/open-webui:ollama`.

  4. 4

    Open http://localhost:3000 in your browser and create your local account.

  5. 5

    Select the downloaded model and start chatting.

Preguntas frecuentes

What is the minimum VRAM for a personal assistant?

With 8 GB of VRAM you can run 7B–8B parameter models at Q4 smoothly. Phi-4 (14B) at Q4 needs at least 10 GB; an 8 GB RTX 4060 can handle it with some offloading.

Is running a local LLM actually private?

Yes. The model runs 100% on your hardware. Neither the model provider nor any third party sees your conversations. It is the most private option for AI assistants.

What is the difference between Ollama and llama.cpp?

Ollama is an abstraction layer on top of llama.cpp that adds model management, an OpenAI-compatible REST API, and multi-model support. For most users, Ollama is the best choice. llama.cpp is more powerful if you need low-level control.

Found this useful? Get guides like this in your inbox every week.

No spam. Unsubscribe in one click.