Skip to main content
📄 Workflow Template

Local Document Analysis (RAG)

Build a private RAG system to chat with your documents using Ollama and LlamaIndex — no data leaves your machine.

Setup time

45–90 minutes

Min hardware

16 GB RAM + 8 GB VRAM (or CPU-only for small docs)

Software

Ollama + LlamaIndex + ChromaDB

Recommended model

Llama 3.1 8B

1

Install dependencies

Run: pip install llama-index llama-index-vector-stores-chroma chromadb llama-index-llms-ollama llama-index-embeddings-ollama.

2

Set up Ollama with embedding model

Pull both a chat model and an embedding model:

ollama pull llama3.1:8b
ollama pull nomic-embed-text

Tip:

nomic-embed-text is optimized for RAG and only needs ~500 MB VRAM

3

Index your documents

Place PDF/TXT files in a docs/ folder and run the indexing script. LlamaIndex will chunk, embed, and store them in ChromaDB locally.

4

Query your documents

Use the query engine to ask natural-language questions about your documents. Llama 3 will retrieve relevant chunks and synthesize an answer with citations.

5

Add a web interface

Use Chainlit or Gradio for a browser UI: pip install chainlit. Both support streaming responses.

Tip:

Keep document collections under 1,000 pages for best retrieval quality without a GPU

Affiliate disclosure: links to Amazon may earn us a commission at no extra cost to you.

Compatible GPUs for this workflow

For smooth local AI inference with this workflow, these NVIDIA RTX GPUs deliver the best experience.

Prices and availability may change.