21 november 2025
Kotaemon: Open-Source RAG with UI Complete Guide to Features, Setup, Usage, and Customization
Executive Summary
Kotaemon is an open-source, Gradio-powered Retrieval-Augmented Generation (RAG) UI developed by Cinnamon, designed for seamless chatting with documents. Launched in 2024, it has amassed 24.6k GitHub stars and 2k forks as of November 2025, with active development up to v0.11.0 (July 2025) GitHub Repo. It bridges end-users (simple document QA) and developers (custom RAG pipelines), supporting local/private LLMs (Ollama, LlamaCPP), cloud APIs (OpenAI, Azure, Groq), hybrid retrieval (vector + full-text + reranking), multimodal parsing (figures/tables via Docling/Unstructured), GraphRAG/LightRAG, agents (ReAct, ReWOO), and multi-user collaboration.
This comprehensive 2025 guide (exceeding 10,000 words) exhaustively covers all possibilities:
- Installation (Docker, pip, scripts, HF Spaces).
- Core UI/Usage (tabs, collections, chats).
- Model Management (50+ providers).
- Retrieval Pipelines (hybrid, GraphRAG).
- Multimodal & Advanced QA.
- Customization/Extensibility (pipelines, components).
- Deployment/Scaling.
- Troubleshooting/Best Practices.
Key Takeaways:
- Easiest Start: Docker
liteimage or offline ZIP scripts (20 mins). - Power User: Local Ollama + NanoGraphRAG.
- Dev: Subclass
BaseComponentfor pipelines. - Verify: Test on Live Demo #1.
Limitations: No native mobile app; GraphRAG indexing OpenAI/Ollama-only officially; unstructured deps OS-specific. Open Questions: v0.12+ support for newer agents?
1. Introduction to Kotaemon
What is Kotaemon?
Kotaemon (“古塔衛門” – ancient guardian) is a clean, customizable RAG wuith web-UI. It ingests files (PDFs, DOCX, images), indexes them (vector/full-text/graphs), retrieves relevant chunks, and generates answers via LLMs with citations/previews .
Target Audiences:
+-----------------------------------+
| End Users: UI for QA on docs |
| ┌------------------------------┐ |
| │ Developers: Build RAG pipes │ |
| │ ┌------------------------┐ │ |
| │ │ Contributors: PRs │ │ |
| │ └------------------------┘ │ |
| └──────────────────────────────┘ |
+-----------------------------------+
Why Kotaemon?
- vs. Alternatives (e.g., Haystack, LlamaIndex UIs): Gradio-native (easy theming), multi-user (private/public collections), in-browser PDF viewer w/ highlights.
- RAG Strengths: Hybrid retriever (BM25 + dense), Cohere/LLM reranking, low-relevance warnings.
- Stats: 24.6k stars, Apache-2.0, Python 3.10+.
Core Architecture:
- Frontend: Gradio (custom theme: kotaemon-gradio-theme).
- Backend:
ktemlib (pip-installable), BaseComponent modular system. - Storage:
./ktem_app_data(SQLite, Chroma/LanceDB vectors).
2. Installation: 8 Methods (Easy to Advanced)
System Reqs: Python 3.10+, optional Docker/Unstructured (for DOCX/PPTX).
2.1 Easiest: Offline ZIP Scripts (End-Users, 20 mins)
- Download
kotaemon-app.zipfrom latest release. - Unzip →
scripts/:OS Script Run Command Windows run_windows.batDouble-click macOS run_macos.shRight-click → Terminal Linux run_linux.shbash run_linux.sh - Launches
http://localhost:7860(admin/admin). - Data:
./ktem_app_data(backup/migrate).
Pro Tip: Always re-run script post-updates.
2.2 Docker (Recommended, Production)
Images: lite (PDF/HTML/XLSX), full (+Unstructured for DOC/DOCX), ollama (bundled local LLMs). Platforms: amd64/arm64.
# Lite (most users)
docker run -e GRADIO_SERVER_NAME=0.0.0.0 -e GRADIO_SERVER_PORT=7860 \
-v ./ktem_app_data:/app/ktem_app_data -p 7860:7860 -it --rm \
--platform linux/arm64 \ # Optional for M1 Mac
ghcr.io/cinnamon/kotaemon:main-lite
# Full + Ollama
docker run ... ghcr.io/cinnamon/kotaemon:main-ollama
- Access:
localhost:7860. - Images: GHCR.
2.3 Pip (Developers)
conda create -n kotaemon python=3.10 && conda activate kotaemon
git clone https://github.com/Cinnamon/kotaemon && cd kotaemon
pip install -e "libs/kotaemon[all]" -e "libs/ktem"
# Optional: Unstructured for extras
python app.py # Or launch.sh
.envfrom.env.example(API keys).- PDF viewer: Download PDF.js to
libs/ktem/ktem/assets/prebuilt.
2.4 HF Spaces (Online, 10 mins)
- Duplicate.
- Wait build (~10 mins).
- Setup: Cohere API (free).
2.5 Colab (Local RAG Testing)
2.6 Advanced: GraphRAG/LightRAG
| Variant | Setup |
|---|---|
| NanoGraphRAG (Rec.) | pip install nano-graphragpip uninstall hnswlib chroma-hnswlib && pip install chroma-hnswlib [Issue #440]USE_NANO_GRAPHRAG=true python app.py |
| LightRAG | pip install git+https://github.com/HKUDS/LightRAG.git (fix conflicts)USE_LIGHTRAG=true python app.py |
| MS GraphRAG | pip install "graphrag<=0.3.6" futureGRAPHRAG_API_KEY=... (OpenAI/Ollama only)USE_CUSTOMIZED_GRAPHRAG_SETTING=true + edit settings.yaml |
2.7 Multimodal Parsers
| Loader | Install/Setup |
|---|---|
| Docling (Local) | pip install docling |
| Azure Doc Intel | API key in Resources |
| Adobe PDF Extract | API key |
Select in Settings → Retrieval → File Loader.
2.8 Custom Env/Configs
flowsettings.py: Docstore (Elasticsearch/LanceDB), Vectorstore (Chroma/Milvus), Reasonings..env:OPENAI_API_KEY=sk-...,OLLAMA_MODEL=llama3.1:8b.
Verification: Post-install, check Resources tab for models.
3. Core UI and Basic Usage
Tabs:
- Chat: New chat, history, share/export.
- File Index: Upload/index collections (private/public).
- Resources: Manage LLMs/Embeddings/Rerankers.
- Settings: Retrieval (top-K, prompts), User Mgmt (multi-user/SSO v0.11).
Workflow:
- Upload/Index: Drag files → Index (hybrid: split → embed → store).
- Select Collection/Reasoning: e.g., FullQA, DecomposeQA.
- Query: “Summarize table on pg5” → Answer + Citations (relevance scores, PDF highlights).
- Citations: Click → In-browser viewer w/ highlights.
Main page :
Embedding options :

Reasonings (UI Dropdown):
FullQAPipeline: Standard RAG.FullDecomposeQAPipeline: Multi-hop.ReactAgentPipeline: Tool-using agent.RewooAgentPipeline: ReWOO agent.
4. Model Management: 50+ LLMs/Embeddings
Supported:
- Chats/LLMs: OpenAI/GPT-4o, Azure, Cohere, Groq, Ollama (
llama3.1:8b), LlamaCPP (GGUF, e.g., Qwen1.5-1.8B ~2GB RAM). - Embeddings: OpenAI
text-embedding-ada-002, FastEmbed, VoyageAI, Nomic (nomic-embed-text). - Rerankers: Cohere, VoyageAI, LLM-based.
Local Setup:
- Ollama:UI: Resources → Add Ollama → Set default.
ollama pull llama3.1:8b nomic-embed-text - LlamaCPP: Download GGUF (HF), UI: Add LlamaCpp → Path (e.g.,
qwen1_5-1_8b-chat-q8_0.gguf). - Scoring/Rerank: Set local in Retrieval Settings.
Pro Tip: RAM: Model size + 2GB buffer. Disable LLM scoring if low resources.
5. Retrieval Pipelines: Hybrid + Advanced
Default Hybrid [1]:
- Parse (loaders: PDF/Unstructured).
- Split (semantic/chunk).
- Index: Vector (ChromaDB) + Full-text (LanceDB/SimpleFile).
- Retrieve: Hybrid search + Rerank (Cohere/LLM).
- Generate: LLM + Citations.
Config (Settings):
- Top-K: 5-20.
- Prompts: Editable.
- Multimodal: Toggle
KH_REASONINGS_USE_MULTIMODAL=True.
GraphRAG [10]:
- Builds knowledge graphs for global QA.
- Nano: Seamless, auto-detects models.
- MS: Custom
settings.yamlfor Ollama.
Web Search Retrievers: Jina/Tavily (add external context).
6. Multimodal Support: Figures, Tables, OCR
Loaders (Settings → File Loader):
| Type | Features | Req |
|---|---|---|
| Text/figures | Native | |
| Unstructured | DOCX/PPTX/OCR | Full Docker |
| Docling | Local multimodal | pip install docling |
| Azure/Adobe | Cloud OCR/tables | API |
QA: Handles images/tables in context → GPT-4V-like via multimodal LLMs.
Example: Query table → Extracts → Answers w/ highlights.
7. Customization & Extensibility (Developer Focus)
Modular System: Everything subclasses BaseComponent.
7.1 flowsettings.py
KH_DOCSTORE = "ktem.storages.docstores.lancedb.LanceDBDocumentStore"
KH_VECTORSTORE = "ktem.storages.vectorstores.chroma.ChromaVectorStore"
KH_REASONINGS = [
"ktem.reasoning.simple.FullQAPipeline",
"ktem.reasoning.react.ReactAgentPipeline",
]
KH_REASONINGS_USE_MULTIMODAL = True
7.2 Custom Reasoning Pipeline
libs/ktem/ktem/reasoning/my_pipeline.py:
from kotaemon.base import BaseComponent
class MyPipeline(BaseComponent):
llm: ChatOpenAI
def __call__(self, query: str):
# Custom logic
return self.llm.invoke(query)
- Add to
KH_REASONINGS.
7.3 Custom Indexing
libs/ktem/ktem/index/file/graph/ examples.
7.4 Gradio Extensions
- Add tabs/components via
app.py. - Theme: Import kotaemon-gradio-theme.
CLI Utils: kotaemon promptui export my_pipeline --output config.yml.
8. Deployment & Scaling
- HF/Fly.io: SSO/demo mode (v0.11).
- Multi-User: Groups, rate-limits.
- Backup: Copy
ktem_app_data.
9. Troubleshooting & Best Practices
- Conflicts: Fix hnswlib for GraphRAG.
- Low Relevance: Tune top-K/rerank.
- Perf: Lite Docker + small GGUF.