Local Document Analysis (RAG)
Build a private RAG system to chat with your documents using Ollama and LlamaIndex — no data leaves your machine.
Setup time
45–90 minutes
Min hardware
16 GB RAM + 8 GB VRAM (or CPU-only for small docs)
Software
Ollama + LlamaIndex + ChromaDB
Recommended model
Llama 3.1 8B
Install dependencies
Run: pip install llama-index llama-index-vector-stores-chroma chromadb llama-index-llms-ollama llama-index-embeddings-ollama.
Set up Ollama with embedding model
Pull both a chat model and an embedding model:
ollama pull llama3.1:8b ollama pull nomic-embed-text
nomic-embed-text is optimized for RAG and only needs ~500 MB VRAM
Index your documents
Place PDF/TXT files in a docs/ folder and run the indexing script. LlamaIndex will chunk, embed, and store them in ChromaDB locally.
Query your documents
Use the query engine to ask natural-language questions about your documents. Llama 3 will retrieve relevant chunks and synthesize an answer with citations.
Add a web interface
Use Chainlit or Gradio for a browser UI: pip install chainlit. Both support streaming responses.
Keep document collections under 1,000 pages for best retrieval quality without a GPU
Affiliate disclosure: links to Amazon may earn us a commission at no extra cost to you.
Compatible GPUs for this workflow
For smooth local AI inference with this workflow, these NVIDIA RTX GPUs deliver the best experience.
Prices and availability may change.