📄 Workflow Template

Local Document Analysis (RAG)

Build a private RAG system to chat with your documents using Ollama and LlamaIndex — no data leaves your machine.

Setup time

45–90 minutes

Min hardware

16 GB RAM + 8 GB VRAM (or CPU-only for small docs)

Software

Ollama + LlamaIndex + ChromaDB

Recommended model

Llama 3.1 8B

Install dependencies

Run: pip install llama-index llama-index-vector-stores-chroma chromadb llama-index-llms-ollama llama-index-embeddings-ollama.

Set up Ollama with embedding model

Pull both a chat model and an embedding model:

ollama pull llama3.1:8b
ollama pull nomic-embed-text

Tip:

nomic-embed-text is optimized for RAG and only needs ~500 MB VRAM

Index your documents

Place PDF/TXT files in a docs/ folder and run the indexing script. LlamaIndex will chunk, embed, and store them in ChromaDB locally.

Query your documents

Use the query engine to ask natural-language questions about your documents. Llama 3 will retrieve relevant chunks and synthesize an answer with citations.

Add a web interface

Use Chainlit or Gradio for a browser UI: pip install chainlit. Both support streaming responses.

Tip:

Keep document collections under 1,000 pages for best retrieval quality without a GPU

Affiliate disclosure: links to Amazon may earn us a commission at no extra cost to you.

Compatible GPUs for this workflow

For smooth local AI inference with this workflow, these NVIDIA RTX GPUs deliver the best experience.

Entry

RTX 4060

8 GB VRAM

View specs Check availability

Balanced

RTX 4070

12 GB VRAM

View specs Check availability

High-end

RTX 4070 Ti Super

16 GB VRAM

View specs Check availability

Prices and availability may change.

Find your ideal AI model → Check GPU compatibility →